Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Compatibility, Deprecation, and Migration Plan

...

We can gate Pre-Vote with a new VoteRequest/VoteResponse version and metadata version, and gate rejecting Pre-Vote requests with the same metadata version bump.

Test Plan

This will be tested with unit tests, integration tests, system tests, and TLA+. (todo)

...

This was originally proposed in the Raft paper as a necessary safeguard to prevent Scenario A from occurring, but we can see how this could extend to cover all the other disruptive scenarios mentioned.

  • When a partitioned node rejoins and forces the cluster to participate in an election …

    • if a majority of the cluster is not receiving fetch responses from the current leader, they consider the vote request and make the appropriate state transitions. An election would be needed in this case anyways.

    • if the rest of the cluster is still receiving fetch responses from the current leader, they reject the vote request from the disruptive follower. No one transitions to a new state (e.g. Unattached) as a result of the vote request, current leader is not disrupted.

  • For a node in an old configuration (that’s not in the new configuration) …

    • if the current leader is still responding to fetch requests in a reasonable amount of time, the node is prevented from starting and winning elections, which would delay reconfiguration.

    • if the current leader is not responding to fetch requests, then the node could still win an election (this scenario calls for an election anyways). KIP-853 should cover preventing this case if necessary.

  • For a node w/ new disk/data loss …

    • if the current leader is still responding to fetch requests in a reasonable amount of time, the node is prevented from starting and winning elections, which could lead to loss of committed data.

    • if the current leader is not responding to fetch requests, we can reject VoteRequests from the node w/ new disk/data loss if it isn't sufficiently caught up on replication. KIP-853 should cover this case w/ storage ids if necessary.

However, this would not be a good standalone alternative to Pre-Vote because once a server starts a disruptive election (disruptive in the sense that the current leader still has majority), its epoch may increase while none of the other servers' epochs do. The most likely way for the server to rejoin the quorum now with its inflated epoch would be to win an election. Since epochs are not increased with Pre-Vote requests, it is easier for a disruptive server to rejoin the quorum once it finds any of the servers in the cluster have been elected. (todo: can it join once it finds the current leader)

...