...
Create another index file for each log segment with name SegmentBaseOffset.time.index to have index at minute level. The time index entry format is:
Code Block | ||
---|---|---|
| ||
Time Index Entry => Timestamp Offset Timestamp => int64 Offset => int32 |
The time index granularity does not change the actual timestamp searching granularity. It only affects the time needed for searching. The way it works will be the same as offset search - find the closet timestamp and corresponding offset, then start the leaner scan over the log until find the target message. The reason we prefer minute level indexing is because timestamp based search is usually rare so it probably does not worth investing significant amount of memory in it.
The time index will be built based on the log index file. Every time when a new entry is inserted into log index file, we take a look at the timestamp of the message and if it falls into next minute, we insert an entry to the time index as well. The following table give the summary of memory consumption using different granularity. The number is calculated based on a broker with 3500 partitions.
...
- Each broker keeps in memory a timestamp index map - Map[TopicPartitionSegment, Map[TimestampByMinute, Offset]]
- The timestamp is on minute boundary
- The offset is the offset of the first message in the log segment that falls into a minute
Create a timestamp index file for each log segment. The entry in the file is as belowin following format:
Code Block language java Time Index Entry => Timestamp Offset Timestamp => int64 Offset => int32
So the timestamp index file will simply become a persistent copy of timestamp index map. Broker will load the timestamp map from the file on startup.
- When a broker (regardless leader or follower) receives a message, it does the following:
- Find which minute MIN the message with offset OFFSET falls in
- Check if MIN has already been in the in memory timestamp map for current log segment. If the timestamp does not exist, then the broker add [MIN->OFFSET] to both the in memory timestamp index map and the timestamp index file.
- When a log segment is deleted, the broker:
- Remove the TopicPartitionSegment key from in memory map
- Remove the log segment timestamp index file
...