Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For any fetch requests, ReplicaManager will proceed with making a call to readFromLocalLog, if this method returns OffsetOutOfRange exception it will delegate the read call to RemoteLogManager.readFromRemoteLog and returns the LogReadResult. More details are explained in the RLM/RSM tasks section.

Follower Requests/Replication

For follower fetch, the leader only returns the data that is still in the leader's local storage. If a LogSegment copied into remote storage by a leader broker, the follower doesn't need to copy this segment which is already present in remote storage. Instead, a follower will retrieve the information of the segment from remote storage. If a Replica becomes a leader, It can still locate and serve data from remote storage.

Other APIs

DeleteRecords

There is no change in the semantics of this API. It deletes records until the given offset if possible. This is equivalent to updating logStartOffset of the partition log with the given offset if it is greater than the current log-start-offset and it is less than or equal to high-watermark. If needed, it will clean remote logs asynchronously after updating log-start-offset of the log.

ListOffsets

ListOffsets API gives the offset(s) for the given timestamp either by looking into the local log or remote log time indexes. 

If the target timestamp is
ListOffsetRequest.EARLIEST_TIMESTAMP (value as -2) returns logStartOffset of the log.
ListOffsetRequest.LATEST_TIMESTAMP(value as-1) returns log-stable-offset or log-end-offset based on the isolation level in the request.

This API is enhanced with supporting new target timestamp value as -3 which is called NEXT_LOCAL_TIMESTAMP. There will not be any new fields added in request and response schemes but there will be a version bump to indicate the version update. This request is about the offset that the followers should start fetching to replicate the local logs. All the records earlier to this offset can be considered as copied to the remote storage. This is used by follower replicas to avoid fetching records that are already copied to remote tier storage and those segments are not available locally on the leader.

When a follower replica tries to fetch records and it receives offset out of range error then the follower sends a request with target timestamp as NEXT_LOCAL_TIMESTAMP so that it can fetch messages starting from earliest messages locally available up to which records are already copied to remote storage.

For timestamps >= 0, it returns the first message offset whose timestamp is >= to the given timestamp in the request. That means it checks in remote log time indexes first, after which local log time indexes are checked. 

JBOD related changes

Currently, JBOD is supported by altering log dirs in two ways.

  • Altering to a different dir on the local broker

    • This can be done by copying remote log metadata files to the respective new topic partition directories in ReplicaAlterLogDirsThread. This will be implemented in the future.

  • Altering to a dir on a remote broker

    • This is equivalent to reassigning partitions to a different broker, which is already supported in this KIP.

RLM/RSM tasks and thread pools
Anchor
rlm-rsm-tasks
rlm-rsm-tasks

...