Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.



The number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric.

kafka.controller:type=KafkaController,name=EventQueueOperationsPerformedCountControllersLongThe total number of event queue operations that were performed. This includes deferred operations.
kafka.controller:type=KafkaController,name=EventQueueOperationsTimedOutCountControllersLongThe total number of event queue operations that timed out before they could be performed.
kafka.controller:type=KafkaController,name=NewActiveControllersCountControllerLongCounts the number of times this node has seen a new controller elected. A transition to the "no leader" state is not counted here. If the same controller as before becomes active, that still counts.
kafka.server:type=MetadataLoader,name=CurrentMetadataVersionBroker and ControllerIntegerOutputs the current effective metadata version as an integer value.
kafka.server:type=MetadataLoader,name=HandleLoadSnapshotCountBroker and ControllerLongThe total number of times we have loaded a KRaft snapshot since the process was started.
kafka.server:type=MetadataLoader,name=LatestSnapshotSizeLatestSnapshotGeneratedBytesBroker and ControllerLongThe total size in bytes of the latest snapshot that the node has generated, or 0 if there hasn't been one yet.
kafka.server:type=MetadataLoader,name=LatestSnapshotDelayMsLatestSnapshotGeneratedAgeMsBroker and ControllerLongThe delay interval in miliseconds since the latest snapshot that the node has generated, or 0 if there hasn't been one yet.


This metric counts the number of times we have loaded a metadata snapshot. This is an O(N) operation since it involves reloading the full metadata state. So it's helpful to know when this has occurred.



This metric is useful to monitor the size of the snapshot generated by the cluster. In general, the larger the snapshot gets, the more resources the cluster will need.



This metric is useful to monitor how long it has been since the node last generated a snapshot. If this time grows too large, it may indicate a potential problem, since loading times might also become very large.
