...

Discussion thread: here

JIRA:

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	KAFKA-9119

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

In the proposed architecture, three controller nodes substitute for the three ZooKeeper nodes. The controller nodes and the broker nodes run in separate JVMs. The controller nodes elect a single leader for the metadata partition, shown in orange. Instead of the controller pushing out updates to the brokers, the brokers pull metadata updates from this leader. That is why the arrows point towards the controller rather than away.

The Controller Quorum

Note that although the controller processes are logically separate from the broker processes, they need not be physically separate. In some cases, it may make sense to deploy some or all of the controller processes on the same node as the broker processes. This is similar to how ZooKeeper processes may be deployed on the same nodes as Kafka brokers today in smaller clusters. As per usual, all sorts of deployment options are possible, including running in the same JVM.

The Controller Quorum

The controller nodes comprise a Raft quorum which manages the metadata log. This log contains information about each change to the cluster metadata. Everything that is currently stored in ZooKeeper, such as topics, The controller nodes comprise a Raft quorum which manages the metadata log. This log contains information about each change to the cluster metadata. Everything that is currently stored in ZooKeeper, such as topics, partitions, ISRs, configurations, and so on, will be stored in this log.

...

Once the last broker node has been rolled, there will be no more need for ZooKeeper. We will remove it from the configuration of the controller quorum nodes, and then roll the controller quorum to fully remove it.

Rejected Alternatives

Combined Controller and Broker Nodes

We could have combined the broker and the controller in the same JVM. This would have the advantage of minimizing the number of JVMs.

However, there are several advantages to keeping them separate. One is that the deployment model is more familiar to Kafka administrators. If they had a certain number of ZooKeeper nodes previously, they can just upgrade to having the same number of controller nodes without rethinking cluster sizing or topology.

Another reason is to avoid an unbalanced load. As the amount of metadata managed by the controller grows, the nodes which must serve this metadata will experience a correspondingly heavier load. This makes it less realistic to treat the controller nodes exactly the same as all other nodes when performing rebalancing or partition assignment. Using separate nodes reduces the chance that the current controller will be disrupted by heavy load on a particular broker. For clusters where the load is small enough that this is not an issue, the system administrator can simply choose to co-locate the controller and broker JVMs.

Pluggable Consensus

Rather than managing metadata ourselves, we could make the metadata storage layer pluggable so that it could work with systems other than ZooKeeper. For example, we could make it possible to store metadata in etcd, Consul, or similar systems.

Pluggable Consensus

Rather than managing metadata ourselves, we could make the metadata storage layer pluggable so that it could work with systems other than ZooKeeper. For example, we could make it possible to store metadata in etcd, Consul, or similar systems.

Unfortunately, this strategy would not address either of the two main goals of ZooKeeper removal. Because they have ZooKeeper-like APIs and design goals, these external systems would not let us treat metadata as an event log. Because they are still external systems that are not integrated with the project, deployment Unfortunately, this strategy would not address either of the two main goals of ZooKeeper removal. Because they have ZooKeeper-like APIs and design goals, these external systems would not let us treat metadata as an event log. Because they are still external systems that are not integrated with the project, deployment and configuration would still remain more complex than they needed to be.

...

This KIP expresses a vision of how we would like to evolve Kafka in the future. We will create follow-on KIPs to hash out the concrete details of each change.

These KIPs will include:

...

to hash out the concrete details of each

...

change.

A KIP for allowing kafka-configs.sh to change topic configurations without using ZooKeeper. It can already change broker configurations without Zookeeper, but it needs to be able to change all configurations without ZooKeeper.
A KIP for adding APIs to replace direct ZK access by the brokers.
A KIP describing the controller changes. This will also specify how metadata is stored, and so on.

...

References

The Raft consensus algorithm

...

Space shortcuts

Child pages

Versions Compared

Old Version 14

New Version Current

Key

The Controller Quorum

The Controller Quorum

Rejected Alternatives

Combined Controller and Broker Nodes

Pluggable Consensus

Pluggable Consensus

References

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 14

New Version Current

Key

The Controller Quorum

The Controller Quorum

Rejected Alternatives

Combined Controller and Broker Nodes

Pluggable Consensus

Pluggable Consensus

References