Status
Current state: Under Discussion
Discussion thread: here
JIRA: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
When a large number of topics/partitions are deleted, the Controlled can get quite backed up and start affecting other parts of the system. Since topic deletes are asynchronous, it is difficult to know when the Controller has worked through the queue of topics marked for deletion. Currently, the only way to check on the progress is by looking directly in ZooKeeper at the /admin/delete_topics znode. In a production environment this is rather cumbersome and is somewhat dangerous (poking around in ZK for a running Kafka cluster).
Proposed Changes
A new JMX gauge is proposed for KafkaController, kafka.controller:type=KafkaController,name=TopicsToDeleteCount, which returns an integral value for the number of topics known to the Controller that are enqueued for deletion. Rather than listing the children of the znode directly, this metric will read the size of the internal set of topics to be deleted from the Controller. During initialization and controller re-elections, this value will be zero as the Controller has not yet read in the list of topics from ZK.
Compatibility, Deprecation, and Migration Plan
Since this is only adding a new metric, it should not affect any metrics gathering clients.
Rejected Alternatives
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.