...

Discussion thread: here

JIRA:

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	KAFKA-9119

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Once the last broker node has been rolled, there will be no more need for ZooKeeper. We will remove it from the configuration of the controller quorum nodes, and then roll the controller quorum to fully remove it.

Rejected Alternatives

Combined Controller and Broker Nodes

We could have combined the broker and the controller in the same JVM. This would have the advantage of minimizing the number of JVMs.

However, there are several advantages to keeping them separate. One is that the deployment model is more familiar to Kafka administrators. If they had a certain number of ZooKeeper nodes previously, they can just upgrade to having the same number of controller nodes without rethinking cluster sizing or topology.

Another reason is to avoid an unbalanced load. As the amount of metadata managed by the controller grows, the nodes which must serve this metadata will experience a correspondingly heavier load. This makes it less realistic to treat the controller nodes exactly the same as all other nodes when performing rebalancing or partition assignment. Using separate nodes reduces the chance that the current controller will be disrupted by heavy load on a particular broker. For clusters where the load is small enough that this is not an issue, the system administrator can simply choose to co-locate the controller and broker JVMs.

Pluggable Consensus

Rather than managing metadata ourselves, we could make the metadata storage layer pluggable so that it could work with systems other than ZooKeeper. For example, we could make it possible to store metadata in etcd, Consul, or similar systems.

Unfortunately, this strategy would not address either of the two main goals of ZooKeeper removal. Because they have ZooKeeper-like APIs and design goals, these external systems would not let us treat metadata as an event log. Because they are still external systems that are not integrated with the project, deployment and configuration would still remain more complex than they needed to be.

Supporting multiple metadata storage options would inevitably decrease the amount of testing we could give to each configuration. Our system tests would have to either run with every possible configuration storage mechanism, which would greatly increase the resources needed, or choose to leave some user under-tested. Increasing the size of test matrix in this fashion would really hurt the project.

Additionally, if we supported multiple metadata storage options, we would have to use "least common denominator" APIs. In other words, we could not use any API unless all possible metadata storage options supported it. In practice, this would make it difficult to optimize the system.

Follow-on Work

This KIP expresses a vision of how we would like to evolve Kafka in the future. We will create follow-on KIPs to hash out the concrete details of each change.

These KIPs will include:

A KIP to implement Raft replication in Kafka. This will specify the new replication protocol and the details of each new RPC.
A KIP for allowing kafka-configs.sh to change topic configurations without using ZooKeeper. It can already change broker configurations without Zookeeper, but it needs to be able to change all configurations without ZooKeeper.
A KIP for adding APIs to replace direct ZK access by the brokers.
A KIP describing the controller changes. This will also specify how metadata is stored, and so on.

...

Pluggable Consensus

Rather than managing metadata ourselves, we could make the metadata storage layer pluggable so that it could work with systems other than ZooKeeper. For example, we could make it possible to store metadata in etcd, Consul, or similar systems.

Unfortunately, this strategy would not address either of the two main goals of ZooKeeper removal. Because they have ZooKeeper-like APIs and design goals, these external systems would not let us treat metadata as an event log. Because they are still external systems that are not integrated with the project, deployment and configuration would still remain more complex than they needed to be.

Supporting multiple metadata storage options would inevitably decrease the amount of testing we could give to each configuration. Our system tests would have to either run with every possible configuration storage mechanism, which would greatly increase the resources needed, or choose to leave some user under-tested. Increasing the size of test matrix in this fashion would really hurt the project.

Additionally, if we supported multiple metadata storage options, we would have to use "least common denominator" APIs. In other words, we could not use any API unless all possible metadata storage options supported it. In practice, this would make it difficult to optimize the system.

Follow-on Work

This KIP expresses a vision of how we would like to evolve Kafka in the future. We will create follow-on KIPs to hash out the concrete details of each change.

References

The Raft consensus algorithm

...

Space shortcuts

Child pages

Versions Compared

Old Version 15

New Version Current

Key

Rejected Alternatives

Combined Controller and Broker Nodes

Pluggable Consensus

Follow-on Work

Pluggable Consensus

Follow-on Work

References

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 15

New Version Current

Key

Rejected Alternatives

Combined Controller and Broker Nodes

Pluggable Consensus

Follow-on Work

Pluggable Consensus

Follow-on Work

References