Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This section discusses how the proposed and rejected option works with a few use cases.

Mirror maker

The behavior of broker for all the options are the same: The broker will always override the LogAppendTime(if exists) when message arrives the broker and keep the CreateTime(if exists) untouched.

The broker does not distinguish mirror maker from other producers. The following example explains what will the timestamp look like when there is mirror maker in the picture.(CT - CreateTime, LAT - LogAppendTime)

  1. Application producer produces message at T0. ( [CT = T0, LAT = -1] ) 
  2. The broker in cluster1 received message at T1 = T0 + latency1 and appended the message to the log. Latency1 includes linger.ms and some other latency. ( [CT = T0, LAT = T1] )
  3. Mirror maker copies the message to broker in cluster2. ( [CT = T0, LAT = T1] )
  4. The broker in cluster2 receives message at T2 = T1 + latency2 and appended the message to the log. ( [CT = T0, LAT = T2] )

The CreateTime of a message in source cluster and target cluster will be same. i.e. the timestamp is passed across clusters.

The LogAppendTime of a message in source cluster and target cluster will be different.

Comparison

Comparison

Log Retention last the last entry in the log index file to enforce the retention policy. Because the leader is the source of truth for LAT, followers need to get the LAT from leader when they replicate the messages. That means we need to introduce a new wire protocol to fetch the time based log index file as well.

When log recovery happens, the rebuilt time index would have different LAT from the actual arrival time of the messages in the log. And the LAT in the index file will be very close, or even the same.

Detail discussion in KIP-33
 

Proposed Option

(Message contains CreateTime + LogAppendTime)

rejected option 1

(Message contains LogAppendTime only)

rejected option 2

(message contains CreateTime only, brokers keep LogAppendTime in log index)

Comparison
Mirror MakerBroker overrides the LAT and keep the CT as is.Broker overrides the LATBroker keep the CT as. And add index entry with LAT to the log index file.

Proposed option provides the most information to user. The only concern is whether we should expose LAT to user.

Rejected option 1 loses the CreateTime information.

Rejected option 2 have same amount information as proposed option from broker point of view. From user point of view, it does not expose the LAT.

Log RetentionBroker will use the LAT of the last message in a segment to enforce the policy.Same as proposed option.

Broker will use the LAT of the last entry in the log index file to enforce the retention policy. Because the leader is the source of truth for LAT, followers need to get the LAT from leader when they replicate the messages. That means we need to introduce a new wire protocol to fetch the time based log index file as well.

When log recovery happens, the rebuilt time index would have different LAT from the actual arrival time of the messages in the log. And the LAT in the index file will be very close, or even the same.

Proposed option and rejected option 1 can work with existing replication design and solve the log retention issue we have now.

rejected option 2 alone can not solve the problem we have now. We need additional replication protocol to solve the log retention problem.

Log rolling
 

Proposed Option

(Message contains CreateTime + LogAppendTime)

rejected option 1

(Message contains LogAppendTime only)

rejected option 2

(message contains CreateTime only, brokers keep LogAppendTime in log index)

Comparison
Mirror MakerBroker overrides the LAT and keep the CT as is.Broker overrides the LATBroker keep the CT as. And add index entry with LAT to the log index file.

Proposed option provides the most information to user. The only concern is whether we should expose LAT to user.

Rejected option 1 loses the CreateTime information.

Rejected option 2 have same amount information as proposed option from broker point of view. From user point of view, it does not expose the LAT.

Broker will use the LAT of the first message in a log segment to enforce the policy.Same as proposed option.

Broker will use the LAT of

Proposed option and rejected option 1 can work with existing replication design and solve the log retention issue we have now.

rejected option 2 alone can not solve the problem we have now. We need additional replication protocol to solve the log retention problem.

Log rollingBroker will use the LAT of the first message in a log segment to enforce the policy.Same as proposed option.

Broker will use the LAT of the first entry in the log index file to enforce the retention policy. Similar to the log retention case, the followers needs to replicate the time index as well.

The log recovery happens, the log rolling might not be honored either.

Proposed option and rejected option 1 solves the log rolling issue.

Rejected option 2 does not solve the problem and needs additional replication protocol.

Stream processingApplications don't need to include the CreateTime in the payload but simply use the CreateTime field.Applications have to put CreateTime into the payload.Applications don't need to include the CreateTime in the payload but simply use the CreateTime field.The benefit of having a CreateTime with each message rather than put it into payload is that application protocol can be simplified. It is convenient for the infrastructure to provide the timestamp so there is no need for each application to worry about the timestamp.
Latency measurementUser can get End2End latency and lag in time.User can get the lag in time.User can get End2End latency.Proposed option has most information for user.
Search message by timestamp.Detail discussion in KIP-33Detail discussion in KIP-33Detail discussion in KIP-33

the first entry in the log index file to enforce the retention policy. Similar to the log retention case, the followers needs to replicate the time index as well.

The log recovery happens, the log rolling might not be honored either.

Proposed option and rejected option 1 solves the log rolling issue.

Rejected option 2 does not solve the problem and needs additional replication protocol.

Stream processingApplications don't need to include the CreateTime in the payload but simply use the CreateTime field.Applications have to put CreateTime into the payload.Applications don't need to include the CreateTime in the payload but simply use the CreateTime field.The benefit of having a CreateTime with each message rather than put it into payload is that application protocol can be simplified. It is convenient for the infrastructure to provide the timestamp so there is no need for each application to worry about the timestamp.
Latency measurementUser can get End2End latency and lag in time.User can get the lag in time.User can get End2End latency.Proposed option has most information for user.
Search message by timestamp.Detail discussion in KIP-33Detail discussion in KIP-33Detail discussion in KIP-33Detail discussion in KIP-33

 

Mirror maker case in detail

 

The behavior of broker for all the options are the same: The broker will always override the LogAppendTime(if exists) when message arrives the broker and keep the CreateTime(if exists) untouched. 

The broker does not distinguish mirror maker from other producers. The following example explains what will the timestamp look like when there is mirror maker in the picture.(CT - CreateTime, LAT - LogAppendTime)

  1. Application producer produces message at T0. ( [CT = T0, LAT = -1] ) 
  2. The broker in cluster1 received message at T1 = T0 + latency1 and appended the message to the log. Latency1 includes linger.ms and some other latency. ( [CT = T0, LAT = T1] )
  3. Mirror maker copies the message to broker in cluster2. ( [CT = T0, LAT = T1] )
  4. The broker in cluster2 receives message at T2 = T1 + latency2 and appended the message to the log. ( [CT = T0, LAT = T2] )

The CreateTime of a message in source cluster and target cluster will be same. i.e. the timestamp is passed across clusters. 

The LogAppendTime of a message in source cluster and target cluster will be different.

Discussion: should we use CreateTime OR LogAppendTime for log retention and time based log rolling?

...