Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

S1 in the below diagram

Table of Contents

Status

Current state: Under Discussion

...

  • Scenario B: A server in an old configuration (e.g. S1 in the below diagram, pg . 41 of Raft paper) starts a “pre-vote” when the leader is temporarily unavailable, and is elected because it is as up-to-date as the majority of the quorum. The Raft paper argues we We can not technically rely on the original leader replicating fast enough to remove S1 from the quorum - we can imagine some bug/limitation with quorum reconfiguration causes S1 to continuously try to start elections when the leader is trying to remove it from the quorum. This scenario will be covered by KIP-853: KRaft Controller Membership Changes or future work if not covered here.

...

We currently use ApiVersions to gate new/newer versions of Raft APIs from being used before all servers can support it. This is useful in the upgrade scenario for Pre-Vote - if a server attempts to send out a Pre-Vote request while any other server in the quorum does not understand it, it will get back an UnsupportedVersionException from the network client and knows to default back to the old behavior. Specifically, the server will transition from Prospective immediately to Candidate state state, and will send standard votes instead which can be understood by servers on older software versions.

Let's take a look at an edge case. As the network client will only check the supported version of the peer that we are intending to send a request to, we can imagine a scenario where a server first sends PreVotes to peers which understand PreVote, and then attempts to send PreVote to a peer which does not. If the server receives and processes a majority of granted PreVote responses prior to hitting the UnsupportedVersionException, it can transition to Candidate phase. Otherwise, it will also transition to Candidate phase once it hits the exception, and send standard vote requests to all servers. Any PreVote responses received while in Candidate phase would be ignored.

Test Plan

This will be tested with unit tests, integration tests, system tests, and TLA+. 

...

Rejecting VoteRequests received within fetch timeout (w/o Pre-Vote) 

This was originally proposed in the Raft paper as a necessary safeguard to prevent Scenario A from occurring, but we can see how this could extend to cover all the other disruptive scenarios mentioned.

...