Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


RemoteLogManager (RLM) is a new component that keeps track of remote log segments 

  • It will delegate the copy and read of these segments to a pluggable storage manager(viz RemoteStorageManager) implementation.

RLM has two modes:

  • RLM Leader - In this mode, RLM is the leader for a topic-partition, checks for rolled over LogSegments (which have last message offset less than last stable offset of that topic partition) and copies it them along with their remote offset/time indexes to the remote tier. RLM creates an index file, called RemoteLogSegmentIndex, per topic-partition to track remote LogSegments. These indexes are described in detail in the next section. Additionally, the RLM leader later. It also serves the fetch requests for older data from the remote tier. Local logs are not cleanedup till those segments are copied successfully to remtoe even though their retention time/size is reached.
  • RLM Follower - In this mode, RLM keeps track of the segments and index files on remote tier and updates its RemoteLogSegmentIndex file per topic-partition. RLM follower does not serve reading old data from the remote tier. Local logs are not cleanedup till their remote log indexes are copied locally from remote storage even though their retention time/size is reached.

Core Kafka changes

To satisfy the goal of keeping Kafka changes minimal when RLM is not configured, Kafka behavior remains unchanged for existing users.

...

For each topic partition that has RLM configured, RLM will ship the log segment files that are older than a configurable time leader for a topic partition copies log segments which have last message offset less than last stable offset of that topic partition to remote storage. The active segment file (the last segment file of each partition, to which the new records are appending) is never shipped to remote storage.

...

magic: int16 (current magic value is 0)

length: int16 (length of this entry)

crc: int32 (checksum from firstOffset to the end of this entry)

firstOffset: int64 (the Kafka offset of the 1st record)

lastOffset: int64 (the Kafka offset of the last record)

firstTimestamp: int64

lastTimestamp: int64

dataLength: int32 (length of the remote data)

rdiLength: int16

rdi: byte[] (Remote data identifier)

todo: We may change this format to have magic and crc for a batch of entries instead of having them for each entry.

RDI (Remote data identifier) is the "pointer" or "URI" of the remote data. The format of RDI depends on the implementation. For example, RDI can be HDFS file path and offset, or S3 key and offset. When reading the remote records, RLM will use RDI to retrieve the remote data.

...

Remoteoffsetindex file and remoteTimestampIndex file are similar with the existing .index file (offset index) and .timeindex file (timestamp index). The only difference is that they point to the index in the corresponding remoteLogIndex file instead of a log segment file.

Manage Remote Log Segments

...

The RLM of a follower broker will retrieve remote segments information from the remote storage. It periodically checks the remote storage to find out the new segments shipped by the leader.

When the follower discovers a new segment in remote storage, it will retrieve the index entries from remote storage and append them into its creates local remote log index file. The RemoteOffsetIndex file and RemoteTimestampIndex file are updated also created accordingly.

The leader may fail to ship segment data to remote storage on time. In such a situation, the follower has to keep its local segment files, even if the configured retention time is reached. The local segment files (and the corresponding index files) can only be deleted in the following 2 cases:

...