Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

NameContextTypeDescription
kafka.controller:type=KafkaController,name=TimedOutBrokerHeartbeatCountControllerLong

The number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric.

kafka.controller:type=KafkaController,name=EventQueueOperationsPerformedCountControllersLongThe total number of event queue operations that were performed. This includes deferred operations.
kafka.controller:type=KafkaController,name=EventQueueOperationsTimedOutCountControllersLongThe total number of event queue operations that timed out before they could be performed.
kafka.controller:type=KafkaController,name=NewActiveControllersCountControllerLongCounts the number of times this node has seen a new controller elected. A transition to the "no leader" state is not counted here. If the same controller as before becomes active, that still counts.
kafka.server:type=MetadataLoader,name=CurrentMetadataVersionBroker and ControllerIntegerOutputs the current effective metadata version as an integer value.
kafka.server:type=MetadataLoader,name=HandleLoadSnapshotCountBroker and ControllerLongThe total number of times we have loaded a KRaft snapshot since the process was started.
kafka.server:type=MetadataLoader,name=LatestSnapshotSizeBroker and ControllerLongThe total size in bytes of the latest snapshot, or 0 if there hasn't been one yet.
kafka.server:type=MetadataLoader,name=LatestSnapshotDelayMsBroker and ControllerLongThe delay in miliseconds since the latest snapshot, or 0 if there hasn't been one yet.

Implementation Notes

In order to avoid excessive performance impacts from these new metrics, none of them will require locks to read.

...

This metric counts the number of times we have loaded a metadata snapshot. This is an O(N) operation since it involves reloading the full metadata state. So it's helpful to know when this has occurred.

LatestSnapshotSize

This metric is useful to monitor the size of the snapshot generated by the cluster. In general, the larger the snapshot gets, the more resources the cluster will need.

LatestSnapshotMs

This metric is useful to monitor how long it has been since the node last generated a snapshot. If this time grows too large, it may indicate a potential problem, since loading times might also become very large.

Compatibility, Deprecation, and Migration Plan

...