Current state: Approved
Discussion thread: here
JIRA: KAFKA-13883
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
With KRaft, Kafka added a new controller quorum to the cluster. These controllers needs to be able to commit records for Kafka to be available. One way to measure availability, is by periodically causing the high-watermark and the last committed offset to increase. Monitoring services can compare that these last committed offsets are advancing. They can also use these metrics to check that all of the brokers and controllers are relatively within each other's offset.
Add a new record to periodically advancement of the LEO and high-watermark. Controller or broker state will not change when applying this record. This record will not be included in the cluster metadata snapshot.
{ “apiKey”: TBD, “type”: “metadata”, “name”: “NoOpRecord”, “validVersions”: “0”, “flexibleVersions”: “0+”, “fields”: [] } |
kafka.controller:type=KafkaController,name=MetadataLastAppliedRecordOffset
kafka.controller:type=KafkaController,name=MetadataLastCommittedRecordOffset
The active controller will report the offset of that last committed offset it consumed. Inactive controllers will always report the same value in MetadataLastAppliedRecordOffset.kafka.controller:type=KafkaController,name=MetadataLastAppliedRecordTimestamp
kafka.controller:type=KafkaController,name=MetadataLastAppliedRecordLagMs
kafka.server:type=broker-metadata-metrics,name=last-applied-record-offset
kafka.server:type=broker-metadata-metrics,name=last-applied-record-timestamp
kafka.server:type=broker-metadata-metrics,name=last-applied-record-lag-ms
kafka.server:type=broker-metadata-metrics,name=pending-record-processing-time-us-avg
kafka.server:type=broker-metadata-metrics,name=pending-record-processing-time-us-max
kafka.server:type=broker-metadata-metrics,name=record-batch-size-byte-avg
kafka.server:type=broker-metadata-metrics,name=record-batch-size-byte-max
The active controller will increase the LEO and high-watermark by periodically writing a no-op record (NoOpRecord) to the metadata log. The active controller will write this new record only if the IBP and metadata.version supports this feature. See the backward compatibility section for details.
The implementation will only append the NoOpRecord, if the LEO wasn’t already advanced in the defined period.
In both the controller and broker, the metadata replaying code will be extended to ignore NoOpRecord.
The IBP and metadata.version will be bumped. This feature and record will only be produced if the active controller is at the expected version or greater.
Users must use the same software version of DumpLogSegment
associated with the server node when reading the metadata cluster log segments.
Instead of using the NoOpRecord metadata record. We could have added a control record in the KRaft layer. This solution has two problems.
It is possible for the active controller to report the max lag from all of the brokers. The brokers send the last committed offset that they read to the active controller. The controller can compute the maximum of these values and report it as a metric.
This works for brokers but it doesn’t work for controllers. The controllers don’t send metadata heartbeat RPCs.