Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Unclean Recovery uses a deterministic way to elect the leader persisted the most data. On a high level, once the unclean recovery is triggered, the controller will use a new API GetReplicaLogInfo to query the log end offset and the leader epoch from each replica. The one with the highest leader epoch plus the longest log end offset will be the new leader. To help explain when and how the Unclean Recovery is performed, let's first introduce some config changes.

The current new unclean.leader.election.enable will be extended with 3 more optionsrecovery.strategy has the following 3 options.

AggressiveProactive. It represents the intent of recovering the availability as fast as possible.
Balanced. Auto recovery on potential data loss case, wait as needed for a better result.
ManualNone. Stop the partition on potential data loss.

...

  1. If there are other ISR members, choose an ISR member.

  2. If there are unfenced ELR members, choose an ELR member.

  3. If there are fenced ELR members

    1. If the unclean.leaderrecovery.election.enablestrategy=ProactiveAggressive, then an unclean recovery will happen.

    2. Otherwise, we will wait for the fenced ELR members to be unfenced.

  4. If there are no ELR members.

    1. If the unclean.leaderrecovery.election.enablestrategy=ProactiveAggressive, the controller will do the unclean recovery.

    2. If the unclean.leaderrecovery.election.enablestrategy=Balanced, the controller will do the unclean recovery when all the LastKnownELR are unfenced. See the following section for the explanations.
    3. Otherwise, unclean.leaderrecovery.election.enablestrategy=ManualNone, the controller will not attempt to elect a leader. Waiting for the user operations.

...

  • In Balance mode, all the LastKnownELR members have replied, plus the replicas replied within the timeout. Due to this requirement, the controller will only start the recovery if the LastKnownELR members are all unfenced.
  • In Proactive Aggressive mode, any replicas replied within a fixed amount of time OR the first response received after the timeout.

...

  1. The kafka-leader-election.sh tool will be upgraded to allow manual leader election.

    1. It can directly select a leader.

    2. It can trigger an unclean recovery for the replica with the longest log in either Proactive Aggressive or Balance mode.

  2. Configs to update
    1. unclean.leaderrecovery.election.enablestrategy. Described in the above section. Balanced  Balanced is the default value. 
    2. unclean.recovery.manager.enabled. True for using the unclean recovery manager to perform an unclean recovery. False otherwise. False is the default value.
    3. unclean.recovery.timeout.ms. The time limits of waiting for the replicas' response during the Unclean Recovery. 5 min is the default value.
  3. For compatibility, the compatibility issue. The original unclean.leader.election.enable options True/False will be used but meaning differently once the unclean recovery manager is in use. Here is the behavior when ISR and ELR are emptymapped to unclean.recovery.strategy options.
    1. unclean.leader.

...

    1. election.

...

    1. enable.

...

    1. false

...

    1. -> unclean.recovery.

...

    1. strategy.

...

    1. Balanced
    2. unclean.leader.election.enable

...

    1. .true -> unclean.recovery.strategy.Aggressive

Public Interfaces

We will deliver the KIP in phases, so the API changes are also marked coming with either ELR or Unclean Recovery.

...

...
// Updated field starts.
--election-type <[PREFERRED, UNCLEAN, LONGEST_LOG_PROACTIVEAGGRESSIVE, LONGEST_LOG_BALANCED, DESIGNATION]:                
                                          Type of election to attempt. Possible
  election type>                          values are "preferred" for preferred
                                          leader election, or "unclean" for
                                          a random unclean leader election,
                                          or "longest_log_proactiveagressive"/"longest_log_balanced"
                                          to choose the replica 
                                          with the longest log or "designation" for
                                          electing the given replica to be the leader. If
                                          preferred election is selection, the
                                          election is only performed if the
                                          current leader is not the preferred
                                          leader for the topic partition. If
                                          longest_log_proactiveagressive/longest_log_balanced/designation 
                                          election is selected, the
                                          election is only performed if there
                                          are no leader for the topic
                                          partition. REQUIRED.                                      
--path-to-json-file <String: Path to    The JSON file with the list  of
  JSON file>                              partition for which leader elections
                                          should be performed. This is an
                                          example format. The desiredLeader field
                                          is only required in DESIGNATION election.
                                        
                                        {"partitions":
                                        	[{"topic": "foo", "partition": 1, "desiredLeader": 0},
                                        	 {"topic": "foobar", "partition": 2, "desiredLeader": 1}]
                                        }
                                        Not allowed if --all-topic-partitions
                                          or --topic flags are specified.
// Updated field ends.

...

Unclean Recovery is guarded by the feature flag unclean.recovery.manager.enabled

Delivery plan

  • For the existing unclean.leader.election.enable
    1. If true, unclean.recovery.strategy will be set to Aggressive.

    2. If false, unclean.recovery.strategy will be set to Balanced.

  • unclean.leader.election.enable will be marked as deprecated.

Delivery plan

The KIP is a large plan, it can be The KIP is a large plan, it can be across multiple quarters. So we have to consider how to deliver the project in phases.

...

Actually in this model, broker 2 is not likely to have the complete log, so just forcing a fixed number of responses does not improve much durability.

Using a different set of configs

We also considered deprecating the unclean.leader.election.enable and using unclean.recovery.strategy(Manual/Balanced/Proactive). It would require the config conversion when we enable using the unclean recovery manager.

...

.