...

The number of partitions is a lever in controlling the performance of a Kafka cluster. Increasing the number of partitions can lead to higher availability and performance. However, increasing the number beyond a point can lead to degraded performance on various fronts. Table 1 shows See Appendix A1 that describes the impact on producer performance in the context of 3 m5.large EC2 broker instances, with 1 KB-sized records and requested throughput of 240Mbps on a single topic. The table indicates that throughput with 3-way replication improves from 52Mbps to 101Mbps in going from 10 to 100 partitions, but then degrades beyond that. Also, the throughput with 1-way replication is better compared to that with 3-way replication. This is because of the number of replication Fetch requests that a broker receives increases with the number of partitions on the broker, and having 1-way replication means that the broker does not have to deal with Fetch requests. But even so, the performance is much worse with 10000 partitions than with 10 partitions.

...

Table 1: Producer performance test throughput (Mbps) on 3 m5.large EC2 broker instances, with 1KB-sized records and requested throughput 240 Mbps on a single topic.

Having lots of partitions on a broker can also increase the recovery time in case of broker failure. TODO

. See Appendix A2 that describes impact on recovery time under broker failure. We have seen multiple issues in production clusters where having a large number of partitions leads to lower availability, yields live-locked clusters requiring hours of operational effort spent in recovery, and in some cases requires just entirely deleting the clusters and starting all over again implying 100% data lossthat are so busy that topic deletion requests intended to alleviate the problem themselves do not complete.

The current recommendation to have no more than 4000 partitions per broker is not enforced by Kafka. Therefore, it is possible that Kafka users accidentally overload their cluster to the point of no return.

In order to mitigate these issues, we propose having a configuration named max.broker.partitions to limit the number of partitions per broker. Different brokers in a cluster should be configurable with different values for this configuration, in order to support heterogenous clusters where some brokers are more resourceful than others. To support dynamic environments where the resources allocated to a broker may change while the broker is running, it should be possible to modify this configuration dynamically i.e. without restarting the brokers.

...

Config name	Type	Default	Update-mode
max.broker.partitions	int32	int32's max value	per-broker

API Exceptions

CreateTopics, CreatePartitions and AlterPartitionReassignments APIs will throw the following exception if it is not possible to satisfy the request while respecting the Kafka administrators can also specify the max.broker.partitions limit.

Code Block

language	java

public class WillExceedBrokerPartitionCapacityException extends ApiException { ... }

In order to be actionable, the exception message will contain a map of brokers with their respective partition capacities.

Corresponding to this exception, we will have the following API error code.

Code Block

language	java

WILL_EXCEED_BROKER_PARTITION_CAPACITY(88, "Cannot satisfy request without exceeding the max.broker.partitions limit on brokers", WillExceedBrokerPartitionCapacityException::new);

...

configuration in the server.properties file.

Kafka users will also be able to use the following to set/modify the max.broker.partitions configuration via ZooKeeper.

For example,

Code Block

language	bash

./kafka-config.sh --zookeeper $ZOOKEEPER --alter --add-config max.broker.partitions=1000 --entity-type brokers --entity-name 1

applies the configuration a limit of 1000 to broker 1 only, and

Code Block

language	bash

./kafka-config.sh --zookeeper $ZOOKEEPER --alter --add-config max.broker.partitions=1000 --entity-type brokers --entity-default

applies the configuration to all brokers.a limit of 1000 to each of the brokers.

API Exceptions

CreateTopics, CreatePartitions and AlterPartitionReassignments APIs will throw the following exception if it is not possible to satisfy the request while respecting the max.broker.partitions limit.

Code Block

language	java

public class WillExceedBrokerPartitionCapacityException extends ApiException { ... }

In order to be actionable, the exception message will contain a map of brokers with their respective partition capacities.

Corresponding to this exception, we will have the following API error code.

Code Block

language	java

WILL_EXCEED_BROKER_PARTITION_CAPACITY(88, "Cannot satisfy request without exceeding the max.broker.partitions limit on brokers", WillExceedBrokerPartitionCapacityException::new);

Proposed Changes

We will introduce a new per-broker int32 configuration called max.broker.partitions that Kafka administrators can specify in the server.properties file or modify via ZooKeeper. This will have a default value equal to the maximum int32 value that Java can represent (2³¹ - 1), which should be sufficient for the foreseeable future. In other words, we don't need int64.

...

Further, in environments such as Docker where it is possible to allocate more resources to the broker on the fly, it would be restrictive to not be able to modify the max.broker.partitions configuration on the fly as well.

Appendix

A1: Producer performance with lots of partitions

The following table shows the impact on producer performance in the context of 3 m5.large EC2 broker instances, with 1 KB-sized records and requested throughput of 240Mbps on a single topic. The table indicates that throughput with 3-way replication improves from 52Mbps to 101Mbps in going from 10 to 100 partitions, but then degrades beyond that. Also, the throughput with 1-way replication is better compared to that with 3-way replication. This is because of the number of replication Fetch requests that a broker receives increases with the number of partitions on the broker, and having 1-way replication means that the broker does not have to deal with Fetch requests. But even so, the performance is much worse with 10000 partitions than with 10 partitions.

TODO: Redo these experiments without specifying a requested throughput.

Produce throughput
Replication	Number of partitions on a broker
	10	100	500	1000	5000	10000
3-way	52	101	86	12	2.5	0.9
1-way	142	113	TODO	TODO	24	16

A2: Recovery time with lots of partitions

Having lots of partitions on a broker can also increase the recovery time in case of broker failure. TODO

Space shortcuts

Child pages

Versions Compared

Old Version 10

New Version 11

Key

API Exceptions

API Exceptions

Proposed Changes

Appendix

A1: Producer performance with lots of partitions

A2: Recovery time with lots of partitions

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 10

New Version 11

Key

API Exceptions

API Exceptions

Proposed Changes

Appendix

A1: Producer performance with lots of partitions

A2: Recovery time with lots of partitions