Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In small clusters, when one node is unavailable, for example when a broker is being restarted, it's possible that there are not enough alive replicas to satisfy topic/partition creation.
For example, in a 3 node cluster, while a rolling restart is happening, users can't create topics with replication factor 3.  In a 4 node cluster, while a node is down, a rolling restart also prevent topics with replication factor 3 from being created.

Another example: in a 9 node cluster with 3 nodes in each zone, if 1 zone was to go offline, the cluster would still contain enough nodes (6) to host a topic with replication factor 3. However, in some environments it may still be preferable to only assign 2 replicas to currently alive nodes (in 2 zones) and assign the last replica to a broker in the unavailable zone that is expected to come back online later.

The same consideration exists for adding partitions to existing topics (CreatePartitions API)The same limitation exists for users wanting to add partitions to existing topics.

Proposed Changes

We would like to allow users to create topics even when the available number of brokers is less than the number of requested replicas, as long as the expected size of the cluster under normal conditions is large enough. As soon as brokers re-join the cluster, the missing replicas will be created.

To allow that, we propose to keep track of brokers that are part of the cluster in zookeeper even after they disconnectdisconnect ("observed brokers"), and use that information to determine an assignment (or to determine if a given assignment is feasible) on handling a topic creation request or partition creation request.

When the number of available brokers is sufficient and no assignment is specified, creating an under-replicated topic/partition may be preferred: e.g. replication factor 3, in a 3-zones region with 2 brokers per region, but a zone is down.
In such case the create request could make use of all observed brokers rather than the currently online ones.

We propose a protocol upgrade to CreateTopicsRequest and CreatePartitionsRequest to specify a minimum number of replicas to be available at creation to satisfy the request. 
A new error code will be added to indicate a topic/partition creation was completed without the full replication factor.

A
broker configuration will be added to allow or disallow creations of without the full replication factor or to allow and select
Topic policies could be used to add more complex validations, for exampleto allow creation with min-isr factor but not less.

...