Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add error code and end quorum logic update

...

  • INVALID_CLUSTER_ID: The request either included a clusterId which does not match the one expected by the leader or failed to include a clusterId when one was expected.
  • FENCED_LEADER_EPOCH: The leader epoch in the request is smaller than the latest known to the recipient of the request.
  • UNKNOWN_LEADER_EPOCH: The leader epoch in the request is larger than expected. Note that this is an unexpected error. Unlike normal Kafka log replication, it cannot happen that the follower receives the newer epoch before the leader.
  • OFFSET_OUT_OF_RANGE: Used in the FetchQuorumRecords API to indicate that the follower has fetched from an invalid offset and should truncate to the offset/epoch indicated in the response.
  • NOT_QUORUM_LEADER: Used in DescribeQuorum and AlterQuorum to indicate that the recipient of the request is not the current leader.
  • INVALID_QUORUM_STATE: This error code is reserved for cases when a request conflicts with the local known state. For example, if two separate nodes try to become leader in the same epoch, then it indicates an illegal state change.
  • INCONSISTENT_VOTER_SET: Used when the request contains inconsistent membership.

Vote

The Vote API is used by voters to hold an election. As mentioned above, the main difference from Raft is that this protocol is pull-based. Voters send fetch requests to the leaders in order to replicate from the log. These fetches also serve as a liveness check for the leader. If a voter perceives a leader as down, it will hold a new election and declare itself a candidate. A voter will begin a new election under three conditions:

...

The EndQuorumEpoch API is used by a leader to gracefully step down so that an election can be held immediately without waiting for the election timeout. It is sent to the most caught-up voter in the quorum from leader's perspective. The primary use case for this is to enable graceful shutdown. If the shutting down voter is either an active current leader or a candidate if there is an election in progress, then this request will be sent. It is also used when the leader needs to be removed from the quorum following an AlterQuorum request. Upon receiving an EndQuorumEpoch request, the successor voter will begin a new election after a random time bounded by quorum.election.jitter.max.ms

The EndQuorumEpochRequest will be sent to all voters in the quorum. Inside each request, leader will define the list of preferred successors sorted by each voter's current replicated offset in descending order. Based on the priority of the preferred successors, each voter will choose the corresponding delayed election time so that the most up-to-date voter has a higher chance to be elected. If the node's priority is highest, it will become candidate immediately instead of waiting for next poll. For a successor with priority N > 0, the next election timeout will be computed as:

Code Block
MIN(retryBackOffMaxMs, retryBackoffMs * 2^(N - 1))

where the retryBackOffMaxMs is currently hard coded as one second.

Request Schema

Code Block
{
  "apiKey": N,
  "type": "request",
  "name": "EndQuorumEpochRequest",
  "validVersions": "0",
  "fields": [
    {"name": "ClusterId", "type": "string", "versions": "0+"},
    {"name": "ReplicaId", "type": "int32", "versions": "0+",
     "about": "The ID of the replica sending this request"},
    {"name": "LeaderId", "type": "int32", "versions": "0+",
     "about": "The current leader ID or -1 if there is a vote in progress"},
    {"name": "LeaderEpoch", "type": "int32", "versions": "0+",
     "about": "The current epoch"},
    {"name": "PreferredSuccessors", "type": "[]int32", "versions": "0+",
      "about": "A sorted list of preferred successors to start the election"}
  ]
}

Note that LeaderId and ReplicaId will be the same if the leader has been voted. If the replica is a candidate in a current election, then LeaderId will be -1.

...

Upon receiving the EndQuorumEpoch, the voter checks if the epoch from the request is greater than or equal to its last known epoch. If yes then it the epoch is smaller than the last known epoch, or the leader id is not known for this epoch, the request is rejected. Then the voter will check whether it is inside the given preferred successors. If not, return INCONSISTENT_VOTER_SET. If both validation pass, the voter can transit to candidate state after waiting a random time up to the maximum jitter time defined by quorum.election.jitter.max.ms. immediately if it is first at the list. Otherwise it will wait for a computed back-off timeout to start election as stated in previous section. Before beginning to collect voters, the voter must update the quorum-state file. If the epoch is smaller than the last known epoch, or the leader id is not known for this epoch, the request is rejected. 

EndQuorumEpoch Response Handling

...