Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current stateUnder DiscussionAccepted

Discussion thread: here

JIRA: KAFKA-6263

...

Add the following metrics via a sensor:1) kafka.coordinator.group

  • kafka.server:type=

...

  • group-coordinator-metrics,name=

...

  • group-load-time-max

Type: SampledStat.Max

Value: 0 or greater over time; maximum time, in milliseconds, it took to load offsets and group metadata from

...

the __consumer_offsets

...

partitions loaded in the last

...

30 seconds.

...

  • kafk.aserver:type=group-coordinator-metrics,name=group-load-time-avg

Type: SampledStat.Avg

Value: 0 or greater over time; average time, in milliseconds, it took to load offsets and group metadata from the __consumer_offsets partitions loaded in the last 30 seconds.

Note: this average may look very low at times when a majority of the partitions are unused causing some load times to be 0 seconds.

  • kafka.server:type=transaction-coordinator-metrics,name=transaction-load-time-max

Type: SampledStat.Max

Value: 0 or greater over time; maximum time, in milliseconds, it took to load offsets and transaction state from

...

the __

...

transaction_state partitions loaded in the last 30 seconds.

  • kafka.server:type=transaction-coordinator-metrics,name=transaction-load-time-avg

Type: SampledStat.Avg

Value: 0 or greater over time; average time, in milliseconds, it took to load offsets and transaction state from the __transaction_state partitions loaded in the last 30 seconds.

Note: this average may look very low at times when a majority of the partitions are unused causing some load times to be 0 seconds.

Proposed Changes

For each of the group metadata manager and transaction state manager, add a sensor that indicates the max and avg number of milliseconds it took to load the each partition. This max is and average are computed from a running window based on the partitions that were loaded finished loading in the last 3 hours. Lengthening or shortening the 3 hour time window is up for discussion (default is 30sec)30 seconds

Compatibility, Deprecation, and Migration Plan

...