Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added proposal to expose DescribeQuorum API via the admin client

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

The DescribeQuorum  API as defined in KIP-595 is intended to allow the admin client to query the status of the Kraft quorum, including information about voter lag. As .

At this moment, if though it is a public API, the DescribeQuorum  API is not accessible via the admin client. Furthermore, as implemented, the API response reports the state of the voters in terms of their LogEndOffsets. While useful, this information by itself is not an accurate measure of voter lag. This information gives us some hint about what the voters' state but it is not a complete check as there is no good way to define an upper bound on how much lag in the LogEndOffset could be problematic. The divergence in this value is dependent upon the metadata load on the cluster at the time of measurement.

This KIP proposes making DescribeQuorum  API accessible via the admin client and augmenting the DescribeQuorum API response with more information to be able to ascertain the liveness and lag of the voters in the quorum more accurately.

Public Interfaces

...

Additional Classes to expose the DescribeQuorum  API to the admin client:

Code Block
public class DescribeQuorumResult {

  private final KafkaFuture<DescribeQuorumResponseData> result;

  public DescribeQuorumResult(KafkaFuture<DescribeQuorumResponseData> result) {
    this.result = result;
  }

  /**
   * Returns a future DescribeQuorumResponse
   */
  public KafkaFuture<DescribeQuorumResponseData> result() {
    return result;

  }
}


Code Block
public class DescribeQuorumOptions extends AbstractOptions<DescribeQuorumOptions> {
}

DescribeQuorum Handler in the Admin Client

Code Block
    /**
     * Describe the state of the raft quorum
     * <p>
     * The following exceptions can be anticipated when calling {@code get()} on the futures obtained from
     * the returned {@code DescribeQuorumResult}:
     * <ul>
     *   <li>{@link org.apache.kafka.common.errors.ClusterAuthorizationException}
     *   If the authenticated user didn't have {@code DESCRIBE} access to the cluster.</li>
     *   <li>{@link org.apache.kafka.common.errors.TimeoutException}
     *   If the request timed out before the controller could list the cluster links.</li>
     * </ul>
     *
     * @param options The options to use when describing the quorum.
     * @return The DescribeQuorumResult.
     */
    DescribeQuorumResult describeQuorum(DescribeQuorumOptions options);

Proposed change in the DescribeQuorum Response:

Code Block
  "apiKey": 55,
  "type": "response",
  "name": "DescribeQuorumResponse",
  "validVersions": "0-1",
  "flexibleVersions": "0+",
  "fields": [
    { "name": "ErrorCode", "type": "int16", "versions": "0+",
      "about": "The top level error code."},
    { "name": "Topics", "type": "[]TopicData",
      "versions": "0+", "fields": [
      { "name": "TopicName", "type": "string", "versions": "0+", "entityType": "topicName",
        "about": "The topic name." },
      { "name": "Partitions", "type": "[]PartitionData",
        "versions": "0+", "fields": [
        { "name": "PartitionIndex", "type": "int32", "versions": "0+",
          "about": "The partition index." },
        { "name": "ErrorCode", "type": "int16", "versions": "0+"},
        { "name": "LeaderId", "type": "int32", "versions": "0+", "entityType": "brokerId",
          "about": "The ID of the current leader or -1 if the leader is unknown."},
        { "name": "LeaderEpoch", "type": "int32", "versions": "0+",
          "about": "The latest known leader epoch"},
        { "name": "HighWatermark", "type": "int64", "versions": "0+"},
        { "name": "CurrentVoters", "type": "[]ReplicaState", "versions": "0+" },
        { "name": "Observers", "type": "[]ReplicaState", "versions": "0+" }
      ]}
    ]}],
  "commonStructs": [
      { "name": "ReplicaState", "versions": "0+", "fields": [
      { "name": "ReplicaId", "type": "int32", "versions": "0+", "entityType": "brokerId" },
      { "name": "LogEndOffset", "type": "int64", "versions": "0+",
        "about": "The last known log end offset of the follower or -1 if it is unknown"},
      { "name": "LastFetchTime", "type": "int64", "versions": "1+",
        "about": "The last known leader wall clock time time when a follower fetched from the leader or -1 if unknown"},
      { "name": "LastCaughtUpTime", "type": "int64", "versions": "1+",
        "about": "The leader wall clock append time of the offset for which the follower made the most recent fetch request or -1 if unknown"}
    ]}
  ]

...

Proposed Changes

This KIP proposes exposing the DescribeQuorum  API to the admin client and adding two new fields (per voter) to the DescribeQuorum API response.

These fields are intended to approximate the "time-lag" between the leader and the followers in the quorum.

...