Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Create another index file for each log segment with name SegmentBaseOffset.timeindex. The density of the index is defined upper bounded by index.interval.bytes configuration.

...

 

Code Block
languagejava
Time Index Entry => Timestamp Offset
  Timestamp => int64
  Offset => int32
  • Timestamp - the biggest timestamp seen so far in this segment.
  • Offset - the next offset when the time index entry is inserted.
  • A time index entry (timestampT, offset) means that in this segment any message whose timestamp is greater than timestamp T come after offset.

Build the time index

...

  1. When a new log segment is created, the broker will create a time index file for the log segment. 
  2. The default initial / max size of the time index files is the same as the offset index files. (time index entry is 1.5x of the size of offset index entry, user should set the configuration accordingly).
  3. Each log segment maintains the largest timestamp so far in that segment. The initial value of the largest timestamp is -1 for a newly created segment.
  4. When broker receives a message, if the message is not rejected due to timestamp exceeds threshold, the message will be appended to the log. (The timestamp will either be LogAppendTime or CreateTime depending on the configuration)
  5. When broker appends the message to the log segment, if an offset index entry is inserted, it will also insert a time index entry if the max timestamp so far is greater than the timestamp in the last time index entry.
    • For message format v0, the timestamp is always -1, so no time index entry will be inserted when message is appended.
  6. When the current active segment is rolled out or closed. A time index entry will be inserted into the time index to ensure the last time index entry has the largest timestamp of the log segment.
    1. If largest timestamp in the segment is non-negative (at least one message has a timestamp), the entry will be (largest_timestamp_in_the_segment -> base_offset_of_the_next_segment)
    2. If largest timestamp in the segment is -1 (No message in the segment has a timestamp), the entry will be (last_modification_time_of_the_segment -> base_offset_of_the_next_segment) 

The time index is not globally monotonically increasing for the segments of a partition. Instead, it is only monotonically increasing within each individual time index file. i.e. It is possible that the time index file for a later log segment contains smaller timestamp than some timestamp in the time index file of an earlier segment. 

...

To enforce time based log retention, the broker will check from the oldest segment forward to the latest segment. For each segment, the broker checks the last time index entry of a log segment. The timestamp will be the latest timestamp of the messages in the log segment. So if that timestamp expires, the broker will delete the log segment. The broker will stop at the first segment which is not expired. i.e. the broker will not expire a segment even if it is expired, unless all the older segment has been expired.

Enforce time based log rolling

...

The change is backward compatible after KIP-31 and KIP-32 are checked in.

User may want to bump up the log.index.size.max.bytes to 1.5x because the time index may take up to 1.5x of the offset index. We will change the index size to appropriate value (3 MB for 1GB segment, 4KB index interval).

Broker will keep keep an in-memory maxTimestampSoFar variable, which is initialized to -1 and only gets updated when a message with a larger timestamp is appended to the log segment.

...