Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. The __cluster_metadata topic will have a cleanup.policy value of snapshot. This configuration can only be read. Updates to this configuration will not be allowed.

  2. metadata.snapshot.min.cleanable.ratio - The minimum ratio of snapshot records that have to change before generating a snapshot. See section "When to Snapshot". The default is .5 (50%).

  3. metadata.snapshot.min.records.size - This is the minimum number of bytes in the replicated log between the latest snapshot and the high-watermark needed before generating a new snapshot. The default is 20MB.
  4. metadata.start.offset.lag.time.max.ms - The maximum amount of time that leader will wait for an offset to get replicated to all of the live replicas before advancing the LogStartOffset. See section “When to Increase the LogStartOffset”. The default is 7 days.

...

  1. Find the snapshot for SnapshotId for the topic Name and partition PartitionIndex.

  2. Set the Size of each snapshot.
  3. Send the bytes in the snapshot from Position up to at most MaxBytes. If there are multiple partitions in the FetchSnapshot request, then the leader will attempt to evenly distribute the number of bytes sent across all of the partitions. The leader will not send more bytes in the response than ResponseMaxBytes, the value configure minimum of MaxBytes in the request and the value configured in replica.fetch.response.max.bytes.

Errors:

    1. Each topic partition is guaranteed to receive at least the average of ResponseMaxBytes if that snapshot has enough bytes remaining.
    2. If there are topic partitions with snapshots that have remaining bytes less than the average ResponseMaxBytes, then those bytes may be used to send snapshot bytes for other topic partitions.

Errors:

  1. SNAPSHOT_NOT_FOUND - when the fetch snapshot request specifies a SnapshotId that doesn’t exists on the leader.

  2. SNAPSHOT_NOT_FOUND - when the fetch snapshot request specifies a SnapshotId that doesn’t exists on the leader.

  3. POSITION_OUT_OF_RANGE - when the fetch snapshot request specifies a position that is greater than the size of the snapshot.

  4. NOT_LEADER_FOR_PARTITION - when the fetch snapshot request is sent to a replica that is not the leader.

...

This KIP is only implemented for the internal topic __cluster_metadata. An increase of the The inter-broker protocol (IBP) is not required, since we are only implementing this for the __cluster_metadata topic partition, that partition is a new partition and will be handle by the KafkaRaftClient. Internal and external clients for all other topics can ignore the SnapshotId as that field will not be set for topic partitions that are not __cluster_metadata.The IBP will be increase when adding support for will be increased to indicate that all of the brokers in the cluster support KIP-595 and KIP-630 to existing topic partition that are not __cluster_metadata.

Rejected Alternatives

Append the snapshot to the Kafka log: Instead of having a separate file for snapshots and separate RPC for downloading the snapshot, the leader could append the snapshot to the log. This design has a few drawbacks. Because the snapshot gets appended to the head of the log that means that it will be replicated to every replica in the partition. This will cause Kafka to use more network bandwidth than necessary.

...