...
The URM manages the recovery process for a leaderless partition. This new unclean recovery process takes the place of the unclean leader election. Instead of electing a random unfenced replica as the leader, the URM will query the log end offset and the leader epoch from each unfenced replica. The one with the highest leader epoch and the longest log end offset will be the new leader.
...
Only the unfenced replicas can be counted into the quorum. So when a replica gets unfenced, URM should check if it can be elected.The URM will query all the replicas including the fenced replicas.
In case of any unforeseen failures that the URM stops the retry to recover or the task hangs for a long time, the controller will trigger the recovery again when handling heartbeats.
...
The kafka-leader-election.sh tool will be upgraded to allow manual leader election.
It can directly select a leader.
It can trigger an unclean recovery for the replica with the longest log in either Proactive or Balance mode.
- The current partition reassignment can be finished as long as the added replicas are in the ISR.
Public Interfaces
We will deliver the KIP in phases, so the API changes are also marked coming with either ELR or Unclean Recovery.
PartitionChangeRecord (coming with ELR)
|
PartitionRecord (coming with ELR)
|
BrokerRegistration API (coming with ELR)
|
DescribeTopicRequest (Coming with ELR)
Should be issued by admin clients or brokers. The controller will serve this request.
...
|
CleanShutdownFile (Coming with ELR)
|
ElectLeadersRequest (Coming with Unclean Recovery)
|
GetReplicaLogInfo Request (Coming with Unclean Recovery)
ACL: Read Topic CLUSTER_ACTION
|
...
|
kafka-leader-election.sh (Coming with Unclean Recovery)
|
...
The following gauge metrics will be added for ELR
- kafka.replication.electable_replicas_count. It will be the sum of (size of ISR + size of ELR). It is a partition level metric.
The following gauge metrics will be added for Unclean Recovery
- kafka.replication.unclean_recovery_partitions_count. It counts the partitions that are under unclean recovery. It will be unset/set to 0 when there is no unclean recovery happening. Note, if in Balance mode, the members in LastKnownELR are not all unfenced, it is also counted as a live recovery.
- kafka.replication.manual_operation_requiredleader_election_required_partition_countunclean_recovery_partitions_count. It counts the partition that is leaderless and waits for user operations to find the next leader.
...
- ELR. The main difference is in the leader election and the unclean leader election.
- The unclean leader election will remain the same as the current. No change to the unclean.leader.election.enable and the behavior is random select an unfenced replica as the leader.
- Leader election will be different when ISR and ELR are both empty. In this case, we try to maintain the "last known leader" behavior. Basically, when the last leader gets fenced, the LastKnownELR field will be also updated. The last leader will be put at the front of the LastKnownELR list. Then if the last leader can be unfenced, it will be elected as the leader. In this way, if only ELR is delivered, there is no regression in availability.
- In summary, if unclean.leader.election.enable is false and the ELR is empty, the controller will elect the first replica in the LastKnownELR to be the leader when it is unfenced. If this replica can't be unfenced, then the controller will keep waiting.
- Unclean recovery.
- The unclean leader election will be replaced by the unclean recovery.
- unclean.leader.election.enable will only be replaced by the unclean.recovery.strategy after ELR is delivered.
- As there is no change to the ISR, the "last known leader" behavior is maintained.
...