Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Another way is to reject election requests sent by servers in old configurations , which will be straightforward if formal vote requests are first preceded by pre-vote requests (which we can reject without causing epoch bumps)due for removal - with Pre-Vote implemented this would not result in any epoch bumps. This could increase the chance of unavailability if the old server is the only one eligible for leadership. To safeguard against this we could have servers only reject election requests received before their fetch timeout hits zero.

Servers in new configuration

...

We will add a new field PreVote to VoteRequests to signal whether the request is a PreVote or not. The candidate does not increase its epoch prior to sending the request out. The VoteResponse  schema does not need any additional fields (still needs a version bump).

...

A candidate will now send a VoteRequest  with the PreVote  field set to true when their its election timeout expires. If a (majority - 1) of VoteResponse received grant the vote, the candidate will now then bump their its epoch up and send a VoteRequest  with PreVote  set to false ( which behaves the same way as before).

When servers receive VoteRequests with the PreVote  field set to true,   they will respond with VoteGranted  set to

  • true if they haven't heard from a leader in fetch.timeout.ms.
  • false if they have heard from a leader in fetch.timeout.ms (could help cover 'Servers in old configuration' scenario).

(Not in scope) To address the disk loss and 'Servers in new configuration' scenario, one option would be to have servers respond false to vote requests from servers that have a new disk and haven't caught up on replication.

How does this prevent unnecessary elections when it comes to network partitions?

When a partitioned node rejoins and forces the cluster to participate in an election, all nodes reject the pre-vote request from the disruptive follower since they've recently heard from the active leader. What becomes of the disruptive follower? The node continuously kicks off elections but is unable to be elected. It should rejoin the quorum when it discovers the higher epoch on the next valid election by another node (todo: check this for accuracy, there should be other ways for the node to become follower earlier)

Could this prevent necessary elections?

Yes, see “Check Quorum” for why we would need an additional safeguard

Do we still need to reject VoteRequests received within fetch timeout if we have implemented Pre-Vote?

Yes, except we would modify the logic to “Rejecting Pre-Vote requests received within fetch timeout”. We need to avoid bumping epochs without a new leader being elected else we may run into the issue where the node requesting the election(s) is now unable to rejoin the quorum because its epoch is greater than everyone else's despite not necessarily having as up-to-date a log. The following are two scenarios where just having Pre-Vote is not enough.

  • A node in an old configuration (e.g. S1 in the below diagram pg. 41) starts a “pre-vote” when the leader is temporarily unavailable, and is elected because it is as up-to-date as the majority of the quorum. The Raft paper argues we can not rely on the original leader replicating fast enough to get past this scenario, however unlikely that it is. We can imagine some bug/limitation with quorum reconfiguration causes S1 to continuously try to reconnect with the quorum (i.e. start elections) when the leader is trying to remove it from the quorum.

...

Image Modified

  • We can also imagine a non-reconfiguration scenario where two nodes, one of which is the leader, are simply unable to communicate with each other. Since the non-leader node is unable to find a leader, it will start an election and may get elected. Since the prior leader is now unable to find the new leader, it will start an election and may get elected. This could continue in a cycle.

Compatibility, Deprecation, and Migration Plan

...