Status

Current state: Accepted

Discussion thread:

JIRA: Unable to render Jira issues macro, execution error.

Released: 1.1.0 (WIP)

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Ensuring that the Kafka Controller is healthy is an important part of monitoring the health of a Kafka Cluster. This is a followup KIP of KIP-143 to add more Kafka Controller metrics that can be useful for monitoring controller health.

Public Interfaces

All of the following will be added via the Yammer metrics library like most of the broker metrics.

 

(1) kafka.controller:type=ControllerEventManager,name=EventQueueSize

type: gauge
value: size of the ControllerEventManager's queue.

(2) kafka.controller:type=ControllerEventManager,name=EventQueueTimeMs

type: histogram
value: time it takes for any event (except the Idle event) to wait in the ControllerEventManager's queue before being processed

(3) kafka.controller:type=ControllerChannelManager,name=RequestRateAndQueueTimeMs, brokerId=someId

type: timer
value: the rate (requests per second) at which the ControllerChannelManager takes requests from the queue of the given broker. And the time it takes for a request to stay in this queue before the it is taken from the queue.

Proposed Changes

We will add the relevant metrics as specified in the Public Interfaces section.

Compatibility, Deprecation, and Migration Plan

We are introducing new metrics so there is no compatibility impact.

Rejected Alternatives

  1. Use Kafka metrics instead of Yammer metrics: most of the broker metrics use Yammer Metrics so it makes sense to stick with that until we have a plan on how to migrate them all to Kafka Metrics.
  • No labels