Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. find out the corresponding RemoteLogSegmentId from RLMM and startPosition and endPosition from the offset index.
  2. try to build Records instance data fetched from RSM.fetchLogSegmentData(RemoteLogSegmentMetadata remoteLogSegmentMetadata, Long startPosition, Long endPosition)
    1. if success, RemoteFetchPurgatory will be notified to return the data to the client
    2. if the remote segment file is already deleted, RemoteFetchPurgatory will be notified to return an error to the client.
    3. if the remote storage operation failed (remote storage is temporarily unavailable), the operation will be retried with Exponential Back-Off, until the original consumer fetch request timeout.

Remote Log

...

Metadata State transitions

Image RemovedImage Added

Leader broker copies the log segments to the remote storage and puts the remote log segment metadata with the state as “COPY_SEGMENT_STARTED” and updates the state as “COPY_SEGMENT_FINISHED” once the copy is successful. Leaders also remove the remote log segments based on the retention policy. Before the log segment is removed using RSM.deleteLogSegment(RemoteLogSegmentMetadata remoteLogSegmentMetadata), it updates the remote log segment with the state as DELETEas DELETE_SEGMENT_STARTED and it updates with DELETEwith DELETE_SEGMENT_FINISHED once it is successful.

When a partition is deleted, leader controller updates its state in RLMM with DELETE_MARKED. A task on each leader of “__remote_log_segment_metadata_topic” partitions consume the messages and checks for messages which have the state as “DELETE_MARKED” and schedules them to be deleted. It commits consumer offsets upto which these markers were handled so that whenever a leader switches to other brokers they can continue from where they were left. The controller considers the topic partition is deleted only when it determines that there are no log segments for that topic partition by using RLMM.PARTITION_MARKED and it expects RLMM will have a mechanism to cleanup the remote log segments. This process for default RLMM is described with details here

RemoteLogMetadataManager implemented with an internal topic

...

remote.log.metadata.topic.replication.factor


Replication factor of the topic

Default: 3

remote.log.metadata.topic.partitions

No of partitions of the topic

Default: 50

remote.log.metadata.topic.retention.ms

Retention of the topic in milli seconds

Default: 365 * 24 * 60 * 60 * 1000  (1 yr)

remote.log.metadata.manager.listener.name

Listener name to be  used to connect to the local broker by RemoteLogMetadataManager implementation on the broker. Respective endpoint address is passed with  "bootstrap.servers" property while invoking RemoteLogMetadataManager#configure(Map<String, ?> props). 

This is used by kafka clients created in RemoteLogMetadataManager implementation.

remote.log.metadata.*

Any other properties should be prefixed with "remote.log.metadata." and these will be passed to RemoteLogMetadataManager#configure(Map<String, ?> props).

For ex: Security configuration to connect to the local broker for the listener name configured are passed with props.


Topic deletio
Anchor
topic-deletion
topic-deletion
n lifecycle

The controller receives a delete request for a topic. It goes through the existing protocol of deletion and it makes all the replicas offline to stop taking any fetch requests.  After all replicas reach this state, the controller publishes an event to the remote log metadata topic by marking the topic as deleted. 


Image Added


RemoteLogCleaner instance is created on the leader for each of the remote log segment metadata topic partitions. It consumes messages from that partitions and filters the delete partition events which need to be processed. It also maintains a committed offset for this instance to handle leader failovers to a different replica so that it can start processing the messages where it left off. 

RemoteLogCleaner(RLC) processes the request with the following flow as mentioned in the below diagram. 

  1. The controller publishes delete_partition_marked event to say that the partition is marked for deletion. There can be multiple events published when the controller restarts or failover and this event will be deduplicated by RLC. 
  2. RLC receives the delete_partition_marked and processes it if it is not yet processed earlier.
  3. RLC publishes an event delete_partition_started that indicates the partition deletion has already been started. 
  4. RLC gets all the remote log segments for the partition and each of these remote log segments is deleted with the next steps.
  5. Publish delete_segment_started event with the segment id. 
  6. RLC deletes the segment using RSM 
  7. Publish delete_segment_finished event with segment id once it is successful. 
  8. Publish delete_partition_finished once all the segments have been deleted successfully.

Image Added[We will add more details later about how the resultant state for each topic partition is computed ]

Public Interfaces

Compacted topics will not have remote storage support. 

...