Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The number of partitions is a lever in controlling the performance of a Kafka cluster. Increasing the number of partitions can lead to higher availability and performance. However, increasing the number beyond a point can lead to degraded performance on various fronts. Table 1 shows See Appendix A1 that describes the impact on producer performance in the context of 3 m5.large EC2 broker instances, with 1 KB-sized records and requested throughput of 240Mbps on a single topic. The table indicates that throughput with 3-way replication improves from 52Mbps to 101Mbps in going from 10 to 100 partitions, but then degrades beyond that. Also, the throughput with 1-way replication is better compared to that with 3-way replication. This is because of the number of replication Fetch requests that a broker receives increases with the number of partitions on the broker, and having 1-way replication means that the broker does not have to deal with Fetch requests. But even so, the performance is much worse with 10000 partitions than with 10 partitions.

...

Table 1: Producer performance test throughput (Mbps) on 3 m5.large EC2 broker instances, with 1KB-sized records and requested throughput 240 Mbps on a single topic.

Having lots of partitions on a broker can also increase the recovery time in case of broker failure. TODO

. See Appendix A2 that describes impact on recovery time under broker failure. We have seen multiple issues in production clusters where having a large number of partitions leads to lower availability, yields live-locked clusters requiring hours of operational effort spent in recovery, and in some cases requires just entirely deleting the clusters and starting all over again implying 100% data lossthat are so busy that topic deletion requests intended to alleviate the problem themselves do not complete.

The current recommendation to have no more than 4000 partitions per broker is not enforced by Kafka. Therefore, it is possible that Kafka users accidentally overload their cluster to the point of no return.

In order to mitigate these issues, we propose having a configuration named max.broker.partitions to limit the number of partitions per broker. Different brokers in a cluster should be configurable with different values for this configuration, in order to support heterogenous clusters where some brokers are more resourceful than others. To support dynamic environments where the resources allocated to a broker may change while the broker is running, it should be possible to modify this configuration dynamically i.e. without restarting the brokers.

...

Config nameTypeDefaultUpdate-mode
max.broker.partitionsint32int32's max valueper-broker

API Exceptions

CreateTopics, CreatePartitions and AlterPartitionReassignments APIs will throw the following exception if it is not possible to satisfy the request while respecting the Kafka administrators can also specify the max.broker.partitions limit.

Code Block
languagejava
public class WillExceedBrokerPartitionCapacityException extends ApiException { ... }

In order to be actionable, the exception message will contain a map of brokers with their respective partition capacities.

Corresponding to this exception, we will have the following API error code.

Code Block
languagejava
WILL_EXCEED_BROKER_PARTITION_CAPACITY(88, "Cannot satisfy request without exceeding the max.broker.partitions limit on brokers", WillExceedBrokerPartitionCapacityException::new); 

...

configuration in the server.properties file. 

Kafka users will also be able to use the following to set/modify the max.broker.partitions configuration via ZooKeeper.

For example,

Code Block
languagebash
./kafka-config.sh --zookeeper $ZOOKEEPER --alter --add-config max.broker.partitions=1000 --entity-type brokers --entity-name 1

applies the configuration a limit of 1000 to broker 1 only, and

Code Block
languagebash
./kafka-config.sh --zookeeper $ZOOKEEPER --alter --add-config max.broker.partitions=1000 --entity-type brokers --entity-default

applies the configuration to all brokers.a limit of 1000 to each of the brokers.

API Exceptions

CreateTopics, CreatePartitions and AlterPartitionReassignments APIs will throw the following exception if it is not possible to satisfy the request while respecting the max.broker.partitions limit.

Code Block
languagejava
public class WillExceedBrokerPartitionCapacityException extends ApiException { ... }

In order to be actionable, the exception message will contain a map of brokers with their respective partition capacities.

Corresponding to this exception, we will have the following API error code.

Code Block
languagejava
WILL_EXCEED_BROKER_PARTITION_CAPACITY(88, "Cannot satisfy request without exceeding the max.broker.partitions limit on brokers", WillExceedBrokerPartitionCapacityException::new); 

Proposed Changes

We will introduce a new per-broker int32 configuration called max.broker.partitions that Kafka administrators can specify in the server.properties file or modify via ZooKeeper. This will have a default value equal to the maximum int32 value that Java can represent (231 - 1), which should be sufficient for the foreseeable future. In other words, we don't need int64.

...

Further, in environments such as Docker where it is possible to allocate more resources to the broker on the fly, it would be restrictive to not be able to modify the max.broker.partitions configuration on the fly as well.

Appendix

A1: Producer performance with lots of partitions

The following table shows the impact on producer performance in the context of 3 m5.large EC2 broker instances, with 1 KB-sized records and requested throughput of 240Mbps on a single topic. The table indicates that throughput with 3-way replication improves from 52Mbps to 101Mbps in going from 10 to 100 partitions, but then degrades beyond that. Also, the throughput with 1-way replication is better compared to that with 3-way replication. This is because of the number of replication Fetch requests that a broker receives increases with the number of partitions on the broker, and having 1-way replication means that the broker does not have to deal with Fetch requests. But even so, the performance is much worse with 10000 partitions than with 10 partitions.

TODO: Redo these experiments without specifying a requested throughput.

Produce throughput
ReplicationNumber of partitions on a broker

101005001000500010000
3-way5210186122.50.9
1-way142113TODOTODO2416

A2: Recovery time with lots of partitions

Having lots of partitions on a broker can also increase the recovery time in case of broker failure. TODO