You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Status

Current state: Under Discussion

Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]

JIRA: KAFKA-13883

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

With KRaft, Kafka added a new controller quorum to the cluster. This quorum needs to be available for Kafka to be available. We are going to measure availability by periodically causing the high-watermark and the last committed offset to increase. Monitoring service can compare these last committed offset against every controller and broker.

Public Interfaces

Cluster Metadata Records

NoopRecord

Add a new record to periodically advancement of the LEO and high-watermark. Controller or broker state will not change when applying this record. This record will not be included in the cluster metadata snapshot.

{
  “apiKey”: TBD,
  “type”:  “metadata”,
  “name”: “NoopRecord”,
  “validVersions”: “0”,
  “flexibleVersions”: “0+”,
  “fields”: []
}

Metrics

Controller

  1. kafka.controller:type=KafkaController,name=MetadataLastCommittedOffset 
    Reports the last committed offset consumed by the controller.
  2. kafka.controller:type=KafkaController,name=MetadataWriteOffses
    The active controller will report the offset of the latest write. Inactive controllers will report 0.
  3. kafka.controller:type=KafkaController,name=MetadataLastCommittedTimestamp
    Reports the append time of the last committed record batch.
  4. kafka.controller:type=KafkaController,name=MetadataLastCommittedLagMs
    Reports the difference between the local time and the append time of the last committed record batch.

Broker

  1. kafka.server.metadata:type=BrokerMetadataPublisher,name=MetadataLastCommittedOffset 
  2. kafka.server.metadata:type=BrokerMetadataPublisher,name=MetadataLastCommittedTimestamp 
  3. kafka.server.metadata:type=BrokerMetadataPublisher,name=MetadataLastCommittedLagMs 

Configuration

  1. confluent.metadata.monitor.write.ms - Frequency for writing NoopRecord to the cluster metadata log.

Proposed Changes

Describe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.

Compatibility, Deprecation, and Migration Plan

The IBP and metadata.version will be bumped. This feature and record will only be produced if the active controller is at the expected version or greater.

Rejected Alternatives

Control Records

Instead of using the NoopRecord metadata record. We could have added a control record in the KRaft layer. This solution has two problems.

  1. Control records in KRaft are not exposed to the controller and broker KRaft listener. This means that those listeners will not update their last committed offset when those records get committed. This would make it difficult to make the last committed metrics represent what the broker and controllers observe.
  2. Those records will not get snapshotted. It is possible for active controllers to never write to the metadata log if the user doesn't perform any admin or metadata operations. This means that KRaft will write these control records to the log without a mechanism for snapshotting (removing) those records.

Max Lag from the Active Controller

It is possible for the active controller to report the max lag from all of the brokers. The brokers send the last committed offset that they read to the active controller. The controller can compute the maximum of these values and report it as a metric.

This works for brokers but it doesn’t work for controllers. The controllers don’t send metadata heartbeat RPCs.


  • No labels