...
- First it checks whether an epoch larger than the candidate epoch from the request is known. If so, the vote is rejected.
- It checks if it has voted for that candidate epoch already. If it has, then only grant the vote if the candidate id matches the id that was already voted. Otherwise, the vote is rejected.
- If the candidate epoch is larger than its the currently known epoch, then check :
- Check whether
CandidateId
is one of the expected voters. If not, then reject the vote. The candidate in this case may have been part of an incomplete AlterQuorum change, so the voter should accept the epoch bump and itself begin a new election. - Check that the candidate's log is at least as up-to-date as it (see above for the comparison rules). If yes, then grant that vote by first updating the quorum-state file, and then returning the response with
voteGranted
to yes; otherwise rejects that request with the response.
- Check whether
Also note that a candidate always votes for itself at the current candidate epoch. That means, it will also need to update the quorum-state
file as "voting for myself" before sending out the vote requests. On the other hand, if it receives a Vote
request with a larger candidate epoch, it can still grants that vote while at the same time transiting back to voter state because a newer leader may has been elected for a newer epoch.
...
The effect of AlterQuorum
is to change the TargetVoters
field in the AlterQuorumMessage
defined above. Once this is done, the leader will begin the process of bringing the new nodes into the quorum and kicking out the nodes which are no longer needed.
Cancellation: If TargetVoters
is set to null in the request, then effectively we will cancel an ongoing reassignment and leave the quorum with the current voters. Generally, it is preferable to always specify the target quorum.
Request Schema
The more preferable option is to always set the intended TargetVoters
. Note that it is always possible to send a new AlterQuorum
request even if the pending reassignment has not finished. So if we are in the middle of a reassignment from (1, 2, 3) to (4, 5, 6), then the user can cancel the reassignment by resubmitting (1, 2, 3) as the TargetVoters
.
Request Schema
Code Block |
---|
{
"apiKey": N,
"type": "request",
|
Code Block |
{
"apiKey": N,
"type": "request",
"name": "AlterQuorum",
"validVersions": "0",
"flexibleVersions": "0+",
"fields": [
{"name": "ClusterId", "type": "string", "versions": "0+"},
{"name": "TargetVoters", "type": "[]Voter", "nullableVersions": "0+", "default": "null",
"about": "The target quorum, or null if this is a cancellation request",
"versions": "0+", "fields": [
{"name": "VoterId", "type": "int32", "versions": "0+"}
]}
]
} |
...
Note that there is one main subtlety with this process. When a follower receives the new quorum state, it immediately begins acting with the new state in mind. Specifically, if the follower becomes a candidate, it will expect votes from a majority of the new voters specified by the reassignment. However, it is possible that the AlterQuorumMessage
gets truncated from the follower's log because a newly elected leader did not have it in its log. In this case, the follower needs to be able to revert to the previous quorum state. To make this simple, voters will only persist quorum state changes in quorum-state
after they have been committed. Upon initialization, any uncommitted state changes will be found by scanning forward from the LastOffset
indicated in the quorum-state
.The other note is that once the reassignment starts as the control record gets propagated , it will never be able to rollback. Instead, admin needs to wait for the current reassignment to finish and performs another reassignment to get back the old quorum. In version one, we don't think it's necessary to add the cancellation support, but definitely would reconsider once the major protocol stabilizes
In a similar vein, if the AlterQuorumMessage
fails to be copied to all voters before a leader failure, then there could be temporary disagreement about voter membership. Each voter must act on the information they have in their own log when deciding whether to grant votes. It is possible in this case for a voter to receive a request from a non-voter (according to its own information). Voters must reject votes from non-voters, but that does not mean that the non-voter cannot ultimately win the election. Hence when a voter receives a VoteRequest
from a non-voter, it must then become a candidate.
Observer Promotion
To ensure no downtime of the cluster switch, newly added nodes should already be acting as an up-to-date observer to avoid unnecessary harm to the cluster availability. There are two approaches to achieve this goal, either from leader side or from observer side:
...