Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Currently when handling topics or partitions creation requests, Kafka enforces all replicas to be created in order to fulfill the request. While all other functionalities (produce/consume) are fault tolerant and can handle some brokers down, topics and partitions creations stop working as soon as there are no enough replicas available. In small clusters, when one node is unavailable, for example when a broker is being restarted, it's possible that there are not enough alive replicas to satisfy topic/partition creation.
For example, in
This is obvious in a few scenarios:

  • In a 3 node cluster, while a rolling restart is happening, users can't create topics with replication factor 3. 

  • In a 4 node cluster, while a node is down, a rolling restart also prevent topics with replication factor 3 from being created.

...


  • in a 9 node cluster with 3 nodes in each zone, if 1 zone was to go offline, the cluster would still contain enough nodes (6) to host a topic with replication factor 3. However, in some environments it may still be preferable to only assign 2 replicas to currently alive nodes (in 2 zones) and assign the last replica to a broker in the unavailable zone that is expected to come back online later.

The same consideration exists for scenarios also apply to adding partitions to existing topics (CreatePartitions API).

...