Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
node may still believe it is leader and send fetch requests to the rest of the quorum

Table of Contents

Status

Current state: Under Discussion

...

This KIP will go over scenarios where we might expect disruptive servers and discuss how Pre-Vote (as originally detailed in the Raft paper and in KIP-650) along with Rejecting VoteRequests received within fetch timeout can Followers rejecting Pre-Vote Requests can ensure correctness when it comes to network partitions (as well as quorum reconfiguration and failed disk scenarios).  

Pre-Vote is the idea of “canvasing” the cluster to check if it would receive a majority of votes - if yes it increases its epoch and sends a disruptive vote request. If not, it does not increase its epoch and does not send a vote request. 

Rejecting VoteRequests received within fetch timeout Followers rejecting Pre-Vote Requests entails servers rejecting any pre-vote requests received prior to their own fetch timeout expiring. The idea here is if we've recently heard from a leader, we should not attempt to elect a new one just yet.

Disruptive server scenarios

Network Partition

Throughout this KIP, we will differentiate between Pre-Vote and the original Vote request behavior with "Pre-Vote" and "standard Vote".

Disruptive server scenarios

Network Partition

When a follower becomes partitioned from the rest of the quorum, it will continuously increase When a follower becomes partitioned from the rest of the quorum, it will continuously increase its epoch to start elections until it is able to regain connection to the leader/rest of the quorum. When the follower regains connectivity, it will disturb the rest of the quorum as they will be forced to participate in an unnecessary election. While this situation only results in one leader stepping down, as we start supporting larger quorums these events may occur more frequently per quorum.

...

todo: Should VoteResponse need to change to indicate response was for Pre-Vote vs Vote Request? (should be possible to retrieve request object for each response, but this isn't really done currently)

Proposed Changes

Pre-Vote

We add a new state Prospective for servers which are sending Pre-Vote requests, and new state transitions.

Code Block
/**
 * Unattached|Resigned transitions to:
 *    Unattached: After learning of a new election with a higher epoch
 *    Voted: After granting a standard vote to a candidate
 *    CandidateProspective: After expiration of the election timeout +
 *    Follower: After discovering a leader with an equal or larger epoch
 *
 * Voted transitions to:
 *    Unattached: After expiration of the election timeout learning of a new election with a higher epoch
 *    FollowerProspective: After discoveringexpiration aof leaderthe withelection an equal or larger epochtimeout +/- 
 *
 * VotedProspective transitions to: +
 *    Unattached: After learning of a new election with a higher epoch +
 *    Candidate: After expirationreceiving a majority of the election timeoutpre-votes +
 * 
 * Candidate transitions to:
 *    Unattached: After learning of a new election with a higher epoch
 *     Candidate Prospective: After expiration of the election timeout +/-
 *    Leader: After receiving a majority of standard votes
 *
 * Leader transitions to:
 *    Unattached: After learning of a new election with a higher epoch
 *    Resigned: When shutting down gracefully
 *
 * Follower transitions to:
 *    Unattached: After learning of a new election with a higher epoch
 *     Candidate Prospective: After expiration of the fetchelection timeout +/-
 *    Follower: After discovering a leader with a larger epoch
 *
 * Observers follow a simpler state machine. The Voted/Candidate/Leader/Resigned
 * states are not possible for observers, so the only transitions that are possible
 * are between Unattached and Follower.
 *
 * Unattached transitions to:
 *    Unattached: After learning of a new election with a higher epoch
 *    Follower: After discovering a leader with an equal or larger epoch
 *
 * Follower transitions to:
 *    Unattached: After learning of a new election with a higher epoch
 *    Follower: After discovering a leader with a larger epoch
 *
 */

...