Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There are both pros and cons for log retention to be based on CreateTime or Receive timeLogAppendTime.

  • Pattern 1:
    Because the latency is small, so CreateTime and LogAppendTime will be close and their won't be much difference.
  • pattern 2:
    If the log retention is based on the message creation time, it will not be affected by the latency in the data pipeline because the send time will not change.
    If the log retention is based on the receive timeLogAppendTime, it will be affected by the latency in the pipeline. Because of the latency difference, some data can be deleted on one cluster, but not on another cluster in the pipeline.
  • Pattern 3: 
    When the messages with significantly different timestamp goes into a cluster at around same time, the retention policy is hard to follow if we use CreateTime. For example, imagine two mirror makers copy data from two source clusters to the same target cluster. If MirrorMaker1 is copying Messages with CreateTime around 1:00 PM today, and MirrorMaker2 is copying messages with CreateTime around 1:00 PM yesterday. Those messages can go to the same log segment in the target cluster. It will be difficult for broker to apply retention policy to the log segment. The broker needs to maintain the knowledge about the latest CreateTime of all messages in a log segment and persist the information somewhere.
  • Robustness: 
    If there is a message with CreateTime set to the future, the log might be kept for very long. Broker needs to sanity check the timestamp when receive the message. It could by tricky to determine which timestamp is not valid

...

 pattern 1pattern 2pattern 3Robustness
PreferenceCT = or LATCT > LATCT < LATCT < LAT

In reality, we usually don't see all the pipeline has same large latency, so it looks LogAppendTime is preferable than CreateTime for log retention.

...

 pattern 1pattern 2pattern 3Robustness
PreferenceCT = or LATCT < LATCT < LATCT < LAT

Application use cases

Stream Processing

...