Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this KIP we propose introducing a time based log index using the timestamp of the messages introduced in KIP-32.

Public Interfaces

This KIP introduce a new configuration time.index.interval.ms on the broker side to control the granularity of time based index.

Besides that, there There will be some behavioral changes to time based log retention and log rolling.

...

Create another index file for each log segment with name SegmentBaseOffset.time.index. The density of the index is defined by time.by index.interval.ms and index.interval.bytes configuration.

The time index entry format is:

...

second864003.4 GB
Minute144057 MB

...

Build the time

...

index

...

Based on the

This configuration allows user to change granularity of indexing

Build the time index

Based on the proposal in KIP-32, the broker will build the time index in the following way:

  1. When broker receives a message, if the message is not rejected due to timestamp exceeds threshold, the message will be appended to the log. 
  2. The timestamp will either be LogAppendTime or CreateTime depending on the configuration.
  3. We will insert When a new log segment is created, the broker will create a time index entry in the following scenarios:file for the log segment.
  4. The time index is not globally monotonically increasing. Instead, it is only guaranteed to be monotonically increasing within each time index file. i.e. It is possible that the time index file for a later log segment contains smaller timestamp than some timestamp in the time index file of earlier segment.
  5. We
  6. If (the_largest_message_timestamp_ever_seen - the_timestamp_of_the_last_time_index_entry) >= time.index.interval.ms AND the broker has appended more than index.interval.bytes since last time index entry insertion.
  7. When a log segment is closed, if the message with largest timestamp is in this closed segment, the broker will insert a time index entry to in the following scenarios:
    1. A time index file is empty and a message is appended to the log segment.
    2. If the timestamp of the appended message is greater than the timestamp of the last time index entry AND the broker has appended more than index.interval.bytes since last time index entry insertion time index. The time index entry points to that message with largest timestamp.
    3. When a new log segment is createdclosed, the broker will insert write a time index entry to the time index of the new log segment when appending the first message whose timestamp is greater than the timestamp of last time index entry.
  8. It is possible that a log segment does not have any time index entry if all the messages has smaller timestamp than the previous log segments. In that case the time based index would be empty.
  9. The default initial / max size of the time index files is the same as offset index files.
    1. file. That time index entry points to the message with largest timestamp in this log segment.
  10. The default initial / max size of the time index files is the same as offset index files.
  11. If all the messages in a log segment have message.format.version before 0.10.0, the broker will insert (last_modification_time_of_the_segment -> offsetIf all the messages in a log segment have message.format.version before 0.10.0, broker will insert a time index entry (last_modification_time_of_the_segment -> offset_of_the_last_message_in_the_segment) to the time indexfor the first reserved entry.

Broker startup

On broker startup, the broker will need to find the latest timestamp of the current active log segment. The latest timestamp may is needed for the next log index append. So the The broker will need to scan from the current active log segment back to earlier log segment until it finds the latest timestamp of messages.find the largest timestamp of the active segment by looking at the last inserted time index entry and scan from there till the log end.

Log Truncation

When the log is truncated, because the offset in the time index is also monotonically increasing, we will also truncate the time index entries whose offsets have been truncated.

Enforce time based log Enforce time based log retention

To enforce time based log retention, the broker will check the last time index entry of a log segment. The timestamp will be the latest timestamp of the messages in the log segment. So if that timestamp expires, the broker will delete the log segment. 

Enforce time based log rolling

Currently time based log rolling is based on the creating time of the log segment. With this KIP, the time based rolling would be changed to based on the largest timestamp ever seen in a log segment. A new log segment will be rolled out if current time is greater than largest timestamp ever seen in the log segment + log.roll.ms. When message.timestamp.type=CreateTime, user should set max.message.time.difference.ms appropriately together with log.roll.ms to avoid frequent log segment . So if that timestamp expires, the broker will delete the log segment. If the log index is empty, the broker will check the previous time index.

Enforce time based log rolling

Currently time based log rolling is based on the creating time of the log segment. With this KIP, the time based rolling would be changed to based on the largest timestamp ever seen. A new log segment will be rolled out if current time is greater than largest timestamp ever seen + log.roll.ms. When message.timestamp.type=CreateTime, user should set max.message.time.difference.ms appropriately together with log.roll.ms to avoid frequent log segment roll out.

...

roll out.

Search message by timestamp

When searching by timestamp, broker will start from the earliest log segment and check the last time index entry. If the timestamp of the last time index entry is greater than the target timestamp, the broker will do binary search on that time index to find the closest index entry and scan the log from there. Otherwise it will move on to the next log segment.

Searching by timestamp will have better accuracy. The guarantees provided are:

...