...
Name | Context | Type | Description |
---|---|---|---|
kafka.controller:type=KafkaController,name=TimedOutBrokerHeartbeatCount | Controller | Long | The number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric. |
kafka.controller:type=KafkaController,name=EventQueueOperationsPerformedCount | Controllers | Long | The total number of event queue operations that were performed. This includes deferred operations. |
kafka.controller:type=KafkaController,name=EventQueueOperationsTimedOutCount | Controllers | Long | The total number of event queue operations that timed out before they could be performed. |
kafka.controller:type=KafkaController,name=NewActiveControllersCount | Controller | Long | Counts the number of times this node has seen a new controller elected. A transition to the "no leader" state is not counted here. If the same controller as before becomes active, that still counts. |
kafka.server:type=MetadataLoader,name=CurrentMetadataVersion | Broker and Controller | Integer | Outputs the current effective metadata version as an integer value. |
kafka.server:type=MetadataLoader,name=HandleLoadSnapshotCount | Broker and Controller | Long | The total number of times we have loaded a KRaft snapshot since the process was started. |
kafka.server:type=MetadataLoader,name=LatestSnapshotSize | Broker and Controller | Long | The total size in bytes of the latest snapshot, or 0 if there hasn't been one yet. |
kafka.server:type=MetadataLoader,name=LatestSnapshotDelayMs | Broker and Controller | Long | The delay in miliseconds since the latest snapshot, or 0 if there hasn't been one yet. |
Implementation Notes
In order to avoid excessive performance impacts from these new metrics, none of them will require locks to read.
...
This metric counts the number of times we have loaded a metadata snapshot. This is an O(N) operation since it involves reloading the full metadata state. So it's helpful to know when this has occurred.
LatestSnapshotSize
This metric is useful to monitor the size of the snapshot generated by the cluster. In general, the larger the snapshot gets, the more resources the cluster will need.
LatestSnapshotMs
This metric is useful to monitor how long it has been since the node last generated a snapshot. If this time grows too large, it may indicate a potential problem, since loading times might also become very large.
Compatibility, Deprecation, and Migration Plan
...