...
We did some performance experiments to understand the effect of increasing the number of partitions. See Appendix A1 for producer performance, and A2 for topic creation and deletion times. These consistently indicate that having a large number of partitions can lead to a malfunctioning cluster.
In order to prevent a cluster from entering a bad state due to a large number of topic partitions, we propose having two configurations (a) max.broker.partitions to limit the number of partitions per broker, and (b) max.partitions to limit the number of partitions in the cluster overall. These can act as guard rails ensuring that the cluster is never operating with a higher number of partitions than it can handle.
...
The following table shows the list of methods that will need to change in order to support the max.broker.partitions and max.partitions configurations. (We skip a few internal methods for the sake of simplicity.)
Method name | Relevant methods which directly depend on this one | Relevant methods on which this one is directly dependent | Description of what the method does currently | Context in which used |
---|---|---|---|---|
`AdminUtils.assignReplicasToBrokers` | `AdminZkClient.createTopic` `AdminZkClient.addPartitions` `AdminManager.createTopics` `ReassignPartitionsCommand.generateAssignment` |
|
| |
`AdminZkClient.createTopicWithAssignment` | `AdminZkClient.createTopic` `AdminManager.createTopics` `ZookeeperTopicService.createTopic` | Creates the ZooKeeper znodes required for topic-specific configuration and replica assignments for the partitions of the topic. |
| |
`AdminZkClient.createTopic` | `KafkaApis.createTopic` `ZookeeperTopicService.createTopic` | `AdminUtils.assignReplicasToBrokers` `AdminZkClient.createTopicWithAssignment` | Computes replica assignment using `AdminUtils.assignReplicasToBrokers` and then reuses `AdminZkClient.createTopicWithAssignment`. |
|
`AdminZkClient.addPartitions` | `AdminManager.createPartitions` `ZookeeperTopicService.alterTopic` | `AdminUtils.assignReplicasToBrokers` |
|
|
`AdminManager.createTopics` | `KafkaApis.handleCreateTopicsRequest` | `AdminUtils.assignReplicasToBrokers` `AdminZkClient.createTopicWithAssignment` |
|
|
`AdminManager.createPartitions` | `KafkaApis.handleCreatePartitionsRequest` | `AdminZkClient.addPartitions` | Used exclusively by `KafkaApis.handleCreatePartitionsRequest` to create partitions on an existing topic. |
|
`KafkaController.onPartitionReassignment` | `KafkaApis.handleAlterPartitionReassignmentsRequest` (not quite directly, but the stack trace in the middle is not relevant) | Handles all the modifications required on ZooKeeper znodes and sending API requests required for moving partitions from some brokers to others. |
| |
`KafkaApis.handleCreateTopicsRequest` | `AdminManager.createTopics` | Handles the CreateTopics API request sent to a broker, if that broker is the controller. |
| |
`KafkaApis.handleCreatePartitionsRequest` | `AdminManager.createPartitions` | Handles the CreatePartitions API request sent to a broker, if that broker is the controller. |
| |
`KafkaApis.handleAlterPartitionReassignmentsRequest` | `KafkaController.onPartitionReassignment` (not quite directly, but the stack trace in the middle is not relevant) | Handles the AlterPartitionReassignments API request sent to a broker, if that broker is the controller. |
| |
`KafkaApis.createTopic` | `KafkaApis.handleTopicMetadataRequest` `KafkaApis.handleFindCoordinatorRequest` (not quite directly, but the stack trace in the middle is not relevant) | `AdminZkClient.createTopic` |
|
|
`KafkaApis.handleTopicMetadataRequest` | `KafkaApis.createTopic` (not quite directly, but the stack trace in the middle is not relevant) | Handles the Metadata API request sent to a broker. |
| |
`KafkaApis.handleFindCoordinatorRequest` | `KafkaApis.createTopic` (not quite directly, but the stack trace in the middle is not relevant) | Handles the FindCoordinator API request sent to a broker. |
| |
`ZookeeperTopicService.createTopic` | `AdminZkClient.createTopic` `AdminZkClient.createTopicWithAssignment` |
|
| |
`ZookeeperTopicService.alterTopic` | `AdminZkClient.addPartitions` |
|
| |
`ReassignPartitionsCommand.generateAssignment` | `AdminUtils.assignReplicasToBrokers` | Used by the ./kafka-reassign-partitions.sh admin tool to generate a replica assignment of partitions for the specified topics onto the set of specified brokers. |
|
For all the methods in the above table that are used in the context of both Kafka API request handling paths and ZooKeeper-based admin tools (`AdminUtils.assignReplicasToBrokers`, `AdminZkClient.createTopicWithAssignment`, `AdminZkClient.createTopic` and `AdminZkClient.addPartitions`), we will pass the values for maximum number of partitions per broker, maximum number of partitions overall, and the current number of partitions for each broker as arguments.
...
We can see that leadership resignation times are exponential in the number of partitions.
Leadership resignation time (minutes) | ||||||
---|---|---|---|---|---|---|
Number of partitions left | ||||||
30000 | 20000 | 10000 | 5000 | 1000 | 100 | 10 |
100 | 43 | 9 | 3 | < 1 | < 1 | < 1 |