Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When a partitioned node rejoins and forces the cluster to participate in an election, all nodes reject the pre-vote request from the disruptive follower since they've recently heard from the active leader. What becomes of the disruptive follower? The The disruptive node continuously kicks off elections but is unable to be elected. It should rejoin the quorum when it discovers the higher epoch on the next valid election by another node (todo: check this for accuracy, there should be other ways for the node to become follower earlier)

Could Can this prevent necessary elections?

Yes, imagine a scenario where a . If a leader is unable to receive fetch responses from a majority of nodes. If it were to remain leader, any nodes , it can impede followers that are able to communicate with the leader would reject votes from a potential new leader that actually may be able to it from voting in another eligible leader that can communicate with a majority of the cluster.  

How

  • It doesn’t - Check Quorum ensures necessary elections can take place. Without it pre-vote can cause quorum unavailability.

"Check Quorum” for why we would need an additional safeguardThis is the reason why an additional "Check Quorum" safeguard is needed which is what KAFKA-15489 implements. Check Quorum ensures a leader steps down if it is unable to receive fetch responses from a majority of nodes.

Do we still need to reject VoteRequests received within fetch timeout if we have implemented Pre-Vote and Check Quorum?

Yes, except . Specifically we would modify the logic to “Rejecting be rejecting Pre-Vote requests received within fetch timeout”timeout. We need to avoid bumping epochs without a new leader being elected else we may run into the issue where the node requesting the election(s) is now will be unable to rejoin the quorum because its epoch is greater than everyone else's despite not necessarily having as up-to-date a log. while its log continues to fall behind.

The following are two scenarios where just having Pre-Vote is not enough.

...