Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The contents of this KIP were authored by Jun Rao.

Table of Contents

Status

Current state: Under Discussion

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Ensuring that the Kafka Controller is healthy is an important part of monitoring the health of a Kafka Cluster. However, the metrics currently exposed are not sufficient for reliably detecting issues like slow progress or deadlocks. We propose a few new metrics that will solve this issue. Even though KAFKA-5028 will potentially fix existing deadlocks, there will still be known (and potentially unknown) issues that can cause slow or no progress so these metrics will still be useful.

...

(12) kafka.cluster:type=Partition,name=FailedIsrUpdateRate

Proposed Changes

We will add the relevant metric type to one of KafkaController, ControllerStats, ControllerChannelManager or Partition as specified in the Public Interfaces section.

Compatibility, Deprecation, and Migration Plan

We are introducing new metrics so there is no compatibility impact.

Rejected Alternatives

  • Don't add these metrics: it's currently difficult to detect these issues, they impact cluster health and the overhead of the proposed metrics is low.

...