Status
Current state: Under Discussion
Discussion thread: here
JIRA: KAFKA-8753
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Since topic deletion cannot always be performed immediately (due to offline replicas, partitions are being reassigned, etc), the Controller marks a topic for deletion and enqueues it for later processing. When a large number of topics (partitions, really) are deleted at once, it can take significant time for the Controller to process everything. During this time, it is not unusual for the Controller to get bogged down. During these times, it would be useful to know how many topics still remain to be deleted. Currently, the only way to check on the progress is by looking directly in ZooKeeper at the /admin/delete_topics znode. In a production environment this is rather cumbersome and is somewhat ill-advised (poking around in ZK on a running Kafka cluster).
Proposed Changes
A new JMX gauge is proposed for KafkaController, kafka.controller:type=KafkaController,name=TopicsToDeleteCount, which returns an integral value for the number of topics known to the Controller that are enqueued for deletion. Rather than listing the children of the znode directly, this metric will read the size of the internal set of topics to be deleted from the Controller. During initialization and controller re-elections, this value will be zero as the Controller has not yet read in the list of topics from ZK.
Compatibility, Deprecation, and Migration Plan
Since this is only adding a new metric, it should not affect any metrics gathering clients.
Rejected Alternatives
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.