Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This KIP describes a protocol for extending KIP-595 and KIP-630 so that the operator can programmatically update the voter set in a way that is safe and is available. There are two important use cases that this KIP supports. One use case is that the operator wants to change the number of controllers by adding or removing a controller.  The other use case is that the operation wants to replace a controller because of a disk or hardware failure.

Proposed Changes

This KIP is inspired by Chapter 4 of Consensus: Bridging Theory and Practice [2]. The description of this KIP makes the assumption that the reader is familiar with the references enumerated at the bottom of this page.

Replica UUID

In addition to a replica ID each replica assigned to a KRaft topic partition will have a replica UUID. This UUID will be generated once and persisted as part of the quorum state for the KRaft topic partition. A replica (controller and broker) may host multiple topic partition partitions in multiple log directories. The replica UUID will be generated per topic partition hosted by the replica.

There are two cases when a replica UUID will be generate:

  1. The topic partition is a new topic partition and doesn't contain a quorum state file persisted on-disk.
  2. The topic partition has a version 0 of the quorum state file. Version 0 of the quorum state file doesn't contain a replica UUID field. The quorum state file will be upgraded to version 1 with the replica UUID persisted.

Replicas in a KRaft topic partition that are voter or can become voters will be identified by their ID and UUID.

...

The section describe how the quorum will get bootstrap to support this KIP. There are two configurations that Kafka will support and each will be explain separately. The Kafka cluster can be configured to use the existing controller.quorum.voters  or the new property called controller.quorum.bootstrap.servers. These two configurations are mutually exclusive the KRaft cluster is expected to use one or the other but not both.

controller.quorum.voters

This is a static quorum configuration where all of the voters' ID, host and port are specified. An example value for this configuration is 1@localhost:9092,2@localhost:9093,3@localhost:9094 .

...

When the leader of the topic partition first discovers the UUID of all the voters through the Fetch and FetchSnapshot RPCs it will write a AddVoterRecord to the log for each of the voters described in the configuration.

The KRaft leader will not perform writes from the state machine (active controller) or client until is has written to the log an AddVoterRecord for every replica id in the controller.quorum.voters  configuration.

...

This state can be discovered by a client by using the DescribeQuorum RPC, the Admin client or the kafka-metadata-quorum .sh CLI. The client can now decide to add replica (3, UUID3') to the set of voters using the AddVoter RPC, the Admin client or the kafka-metadataa-quorum .sh CLI.

When the AddVoter RPC succeeds the voters will be (1, UUID1), (2, UUID2), (3, UUID3) and (3, UUID3'). The client can remove the failed disk from the voter set by using the RemoveVoter RPC, the Admin client or the kafka-metadata-quorum .sh CLI.

When the RemoveVoter RPC succeeds the voters will be (1, UUID1), (2, UUID2), and (3, UUID3'). At this point the Kafka cluster has successfully recover from a disk failure in a controller node.

...

Public Interfaces

Configuration

These two configurations are mutually exclusive the KRaft cluster is expected to use one or the other but not both.

controller.quorum.voters

This is an existing configuration. If the cluster uses this configuration to configure the quorum, adding new replica ids will not be supported. The cluster will only support changing the UUID for an existing replica id.

...

  1. Ongaro, Diego, and John Ousterhout. "In search of an understandable consensus algorithm." 2014 {USENIX} Annual Technical Conference ({USENIX}{ATC} 14). 2014.
  2. Ongaro, Diego. Consensus: Bridging theory and practice. Diss. Stanford University, 2014.

  3. KIP-595: A Raft Protocol for the Metadata Quorum
  4. KIP-630: Kafka Raft Snapshot
  5. KIP-631: The Quorum-based Kafka Controller