THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
There are both pros and cons for log retention to be based on CreateTime or Receive timeLogAppendTime.
- Pattern 1:
Because the latency is small, so CreateTime and LogAppendTime will be close and their won't be much difference. - pattern 2:
If the log retention is based on the message creation time, it will not be affected by the latency in the data pipeline because the send time will not change.
If the log retention is based on the receive timeLogAppendTime, it will be affected by the latency in the pipeline. Because of the latency difference, some data can be deleted on one cluster, but not on another cluster in the pipeline. - Pattern 3:
When the messages with significantly different timestamp goes into a cluster at around same time, the retention policy is hard to follow if we use CreateTime. For example, imagine two mirror makers copy data from two source clusters to the same target cluster. If MirrorMaker1 is copying Messages with CreateTime around 1:00 PM today, and MirrorMaker2 is copying messages with CreateTime around 1:00 PM yesterday. Those messages can go to the same log segment in the target cluster. It will be difficult for broker to apply retention policy to the log segment. The broker needs to maintain the knowledge about the latest CreateTime of all messages in a log segment and persist the information somewhere. - Robustness:
If there is a message with CreateTime set to the future, the log might be kept for very long. Broker needs to sanity check the timestamp when receive the message. It could by tricky to determine which timestamp is not valid
...
pattern 1 | pattern 2 | pattern 3 | Robustness | |
---|---|---|---|---|
Preference | CT = or LAT | CT > LAT | CT < LATCT < | LAT |
In reality, we usually don't see all the pipeline has same large latency, so it looks LogAppendTime is preferable than CreateTime for log retention.
...
pattern 1 | pattern 2 | pattern 3 | Robustness | |
---|---|---|---|---|
Preference | CT = or LAT | CT < LAT | CT < LATCT < | LAT |
Application use cases
Stream Processing
...