Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently when handling topics or partitions creation requests, Kafka enforces all replicas to be created in order to fulfill the request. While all other functionalities (produce/consume) are fault tolerant and can handle some brokers down, topics and partitions creations stop working as soon as there are no enough replicas available. In small clusters, when one node is unavailable, for example when a broker is being restarted, it's possible that there are not enough alive replicas to satisfy topic/partition creation even though there are enough brokers to satisfy the "min.insync.replicas" configuration of the topic.

This is obvious in a few scenarios:

...

We would like to allow users to create topics even when the available number of brokers is less than the number of requested replicas, as long as the expected size of the cluster under normal conditions is large enough. As soon as brokers re-join the cluster, the missing replicas will be created. Also a topic should only be created when there are at least enough brokers to satisfy "min.insync.replicas" configuration of the topic.

To allow that, we propose to keep track of brokers that are part of the cluster in zookeeper even after they disconnect ("observed brokers"), and use that information to determine an assignment (or to determine if a given assignment is feasible) on handling a topic creation request or partition creation request.

When the number of available brokers is sufficient less than the replication factor and no assignment is specified, creating an under-replicated topic/partition may can be preferred: e.g. enabled. For example creating a topic with replication factor 3, in a 3-zones region with 2 brokers 1 broker per region, but a zone is down .
In such case the create request could make use of all observed brokers rather than the currently online ones. A topic will only be created if there are at least "min.insync.replicas" brokers available, ie a topic can be created under-replicated but it will always be at least at min.insync at creationor a rolling restart is happening.

Public Interfaces

Broker Configuration

...

Administrators can disable (false), or enable (true) creations of topics/partitions without the full replication factor. Defaulting to false for compatibility.


Zookeeper changes:

New non-ephemeral node nodes under /brokers called observed_ids.
Upon starting, brokers will create/update the non ephemeral node for their current id containing the same information as the existing ephemeral brokers/ids node. 

...