Current state: Accepted
Discussion thread:
JIRA:
Released: 1.1.0 (WIP)
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Ensuring that the Kafka Controller is healthy is an important part of monitoring the health of a Kafka Cluster. This is a followup KIP of KIP-143 to add more Kafka Controller metrics that can be useful for monitoring controller health.
All of the following will be added via the Yammer metrics library like most of the broker metrics.
(1) kafka.controller:type=ControllerEventManager,name=EventQueueSize
type: gauge
value: size of the ControllerEventManager's queue.
(2) kafka.controller:type=ControllerEventManager,name=EventQueueTimeMs
type: histogram
value: time it takes for any event (except the Idle event) to wait in the ControllerEventManager's queue before being processed
(3) kafka.controller:type=ControllerChannelManager,name=RequestRateAndQueueTimeMs, brokerId=someId
type: timer
value: the rate (requests per second) at which the ControllerChannelManager takes requests from the queue of the given broker. And the time it takes for a request to stay in this queue before the it is taken from the queue.
We will add the relevant metrics as specified in the Public Interfaces section.
We are introducing new metrics so there is no compatibility impact.