We did some performance experiments to understand the effect of increasing the number of partitions. See Appendix A1 that describes the impact on producer performance. See Appendix A2 that describes impact on recovery time under broker controller failure.

In order to mitigate these issues, we propose having two configurations (a) max.broker.partitions to limit the number of partitions per broker, and (b) max.partitions to limit the number of partitions in the cluster overall.

...

This is in general a more flexible approach than the one described in this KIP and allows having different brokers with different resources, each have its own max.broker.partitions configuration. However, this would require sending the broker-specific configuration to the controller, which needs this while creating topics and partitions or reassigning partitions. One approach would be to put this information into the broker's ZooKeeper znode and have the controller rely on that. And the other would be to create a new API request-response that brokers can use to share this information with the controller. Both of these approaches introduce complexity for little gain. We are not aware of any clusters that are running with heterogenous configurations where having different max.broker.partitions configuration for each broker would help. Therefore, in this KIP, we do not take this approach.

Appendix

...

Appendix A:

...

Performance with lots of partitions

The following table shows the impact on producer performance in the context of We did a performance test (using kafka-producer-perf-test.sh from a single m5.4xlarge EC2 instance). We had 3 m5.large EC2 broker instances on 3 different AZs within the same AWS region us-east-1, running Kafka version 2.3.1. Each broker had an EBS GP2 volume attached to it for data storage. All communication was plaintext and records were not compressed.

A1: Producer performance

The following table shows results from a producer performance test. On the producer side, each record was 1 KB in size batch.size and linger.ms were left at their default values.

, with 1 KB-sized records and requested throughput of 240Mbps on a single topic. The table indicates that throughput with 3-way replication improves from 52Mbps to 101Mbps in going from 10 to 100 partitions, but then degrades beyond that. Also, the throughput with 1-way replication is better compared to that with 3-way replication. This is because of the number of replication Fetch requests that a broker receives increases with the number of partitions on the broker, and having 1-way replication means that the broker does not have to deal with Fetch requests. But even so, the performance is much worse with 10000 partitions than with 10 partitions.

TODO: Redo these experiments without specifying a requested throughput.

Produce throughput (Mbps)
Replication	Number of partitions on a broker
	10	100	500	1000	5000	10000
3-way	52	101	86	12	2.5	0.9
1-way	142	113	TODO104	TODO99	24	16

A2: Recovery time

...

under failure

Having lots of partitions on a broker can also increase the recovery time in case of broker controller failure. TODO

Space shortcuts

Child pages

Versions Compared

Old Version 16

New Version 17

Key

Appendix

Appendix A:

Performance with lots of partitions

A1: Producer performance

A2: Recovery time

under failure

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 16

New Version 17

Key

Appendix

Appendix A:

Performance with lots of partitions

A1: Producer performance

A2: Recovery time

under failure