...
The
__cluster_metadata
topic will have acleanup.policy
value ofsnapshot
. This configuration can only be read. Updates to this configuration will not be allowed.metadata.snapshot.min.cleanable.ratio
- The minimum ratio of snapshot records that have to change before generating a snapshot. See section "When to Snapshot". The default is .5 (50%).metadata.snapshot.min.records.size
- This is the minimum number of bytes in the replicated log between the latest snapshot and the high-watermark needed before generating a new snapshot. The default is 20MB.metadata.start.offset.lag.time.max.ms
- The maximum amount of time that leader will wait for an offset to get replicated to all of the live replicas before advancing theLogStartOffset
. See section “When to Increase the LogStartOffset”. The default is 7 days.
...
Find the snapshot for
SnapshotId
for the topicName
and partitionPartitionIndex
.- Set the
Size
of each snapshot. Send the bytes in the snapshot from
Position
up to at mostMaxBytes
. If there are multiple partitions in theFetchSnapshot
request, then the leader will attempt to evenly distribute the number of bytes sent across all of the partitions. The leader will not send more bytes in the response thanResponseMaxBytes
, the value configure minimum ofMaxBytes
in the request and the value configured inreplica.fetch.response.max.bytes
.
Errors:
- Each topic partition is guaranteed to receive at least the average of
ResponseMaxBytes
if that snapshot has enough bytes remaining. - If there are topic partitions with snapshots that have remaining bytes less than the average
ResponseMaxBytes
, then those bytes may be used to send snapshot bytes for other topic partitions.
- Each topic partition is guaranteed to receive at least the average of
Errors:
SNAPSHOT_NOT_FOUND
- when the fetch snapshot request specifies aSnapshotId
that doesn’t exists on the leader.SNAPSHOT_NOT_FOUND
- when the fetch snapshot request specifies aSnapshotId
that doesn’t exists on the leader.POSITION_OUT_OF_RANGE
- when the fetch snapshot request specifies a position that is greater than the size of the snapshot.NOT_LEADER_FOR_PARTITION
- when the fetch snapshot request is sent to a replica that is not the leader.
...
This KIP is only implemented for the internal topic __cluster_metadata
. An increase of the The inter-broker protocol (IBP
) is not required, since we are only implementing this for the __cluster_metadata
topic partition, that partition is a new partition and will be handle by the KafkaRaftClient
. Internal and external clients for all other topics can ignore the SnapshotId
as that field will not be set for topic partitions that are not __cluster_metadata
.The IBP will be increase when adding support for will be increased to indicate that all of the brokers in the cluster support KIP-595 and KIP-630 to existing topic partition that are not __cluster_metadata
.
Rejected Alternatives
Append the snapshot to the Kafka log: Instead of having a separate file for snapshots and separate RPC for downloading the snapshot, the leader could append the snapshot to the log. This design has a few drawbacks. Because the snapshot gets appended to the head of the log that means that it will be replicated to every replica in the partition. This will cause Kafka to use more network bandwidth than necessary.
...