Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently when handling topics or partitions creation requests, Kafka currently enforces all replicas to be created in order to fulfill the request. While all other functionalities (produce/consume) are fault tolerant and can handle some brokers down, topics and partitions creations stop working as soon as there are no enough replicas available. In small clusters, when one node is unavailable, for example when a broker is being restarted, it's possible that there are not enough alive replicas online brokers to satisfy topic/partition creation even though there are enough brokers to satisfy the "min.insync.replicas" configuration of the topic.

This is obvious in a few scenarios:

...

We would like to allow users to create topics and partitions even when the available number of brokers is less than the number of requested replicas, as long as the expected size of the cluster under normal conditions is large enough. As soon as brokers re-join the cluster, the missing replicas will be created. Also a topic should only be created when there are at least enough brokers to satisfy "min.insync.replicas" configuration of the topic.

To allow that, we propose to keep track of brokers that are part of the cluster in zookeeper Zookeeper even after they disconnect ("observed brokers"), and use that information to determine an assignment (or to determine if a given assignment is feasible) on handling a topic creation request or partition creation request.

When the number of available brokers is less than the replication factor and no assignment is specified, creating an under-replicated topic/partition can be enabled. For example, creating a topic with replication factor 3, in a 3-zones region with 1 broker per region, but a zone is down or a rolling restart is happening.

...

NAME

DESCRIPTION

TYPE

DEFAULT

VALID VALUES

IMPORTANCE

DYNAMIC UPDATE MODE

enable.under.replicated.topic.creation

Behavior on Topic or Partition creation without the full set of replicas being active.

BOOLEAN

false

true, false

Medium

cluster-wide

...

  • For compatibility the default behavior remains unaltered: a number of active brokers equals (or greater) to the full replication factor is required for successful creation.

  • No migration is necessary.

  • Updates to the Decommissioning brokers section in the documentation
    • will mention that if a broker id is never to be reused then its corresponding node in zookeeper /brokers/observed_ids will need to be removed if enable.under.replicated.topic.creation is enabled.

Rejected Alternatives

  • Async replica assignment: Instead of relying on previously known brokers, we considered creating replicas on the available brokers at topic/partition create time and saving in zookeeper some information to asynchronously create missing replicas on the next suitable brokers that would join the cluster. Such a process would involve the controller checking if missing replicas existed when a broker joins the cluster. This alternative felt too complicated. Also the next joining broker may not be on the preferred rack assignment.

...