...
To resolve this issue, we will piggy-back on the "quorum.fetch.timeout.ms
" config, such that if the leader did not receive FetchQuorumRecords requests from a majority of the quorum for that amount of time, it would start sending DiscoverBrokers
sending Metadata
request to random nodes in the cluster to understand the latest quorum. If it couldn't connect to any known voter, the old leader shall reset the connection information and send out DiscoverBrokers
. And if the returned response includes a newer epoch leader, this zombie leader would step down and becomes an observer; and if it realized that it is still within the current quorum's voter list, it would start fetching from that leader. Note that the node will remain a leader until it finds that it has been supplanted by another voter.
...
- Upon starting up, brokers always try to bootstrap its knowledge of the quorum by first reading the
quorum-state
file and then scanning forward fromAppliedOffset
to the end of the log to see if there are any changes to the quorum state. For newly started brokers, the log / file would all be empty so no previous knowledge can be restored. - If after step 1), there's some known quorum state along with a leader / epoch already, the broker would:
- Promote itself from observer to voter if it finds out that it's a voter for the epoch.
- Start sending
FetchQuorumRecords
request to the current leader it knows (it may not be the latest epoch's leader actually).
- Otherwise, it will try to learn the quorum state by sending
DiscoverBrokers
to any other brokers inside the cluster viaboostrap.servers
as the second option of quorum state discovery.- As long as a broker does not know all the current quorum voter's connections, it should continue periodically ask other brokers via
DiscoverBrokers.
- As long as a broker does not know all the current quorum voter's connections, it should continue periodically ask other brokers via
- Send out MetadataRequest to the discovered brokers to find the current metadata partition leader.
- As long as a broker does not know the current quorum (including the leader and the voters), it should continue periodically ask other brokersvia
Metadata
.
- As long as a broker does not know the current quorum (including the leader and the voters), it should continue periodically ask other brokersvia
- If even step 3) 4) cannot find any quorum information – e.g. when there's no other brokers in the cluster, or there's a network partition preventing this broker to talk to others in the cluster – fallback to the third option of quorum state discover by checking if it is among the brokers listed in
quorum.voters
.- If so, then it will promote to voter state and add its own connection information to the cached quorum state and return that in the
DiscoverBrokers
responses it answers to other brokers; otherwise stays in observer state. - In either case, it continues to try to send
DiscoverBrokers
to all other brokers in the cluster viaboostrap.servers.
- If so, then it will promote to voter state and add its own connection information to the cached quorum state and return that in the
- For any voter, after it has learned a majority number of voters in the expected quorum from
DiscoverBrokers
responses, it will begin a vote.
...