Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The behaviors in the failover:

  • Broker failover.

    • If the replica fails before it receives the GetReplicaLogInfo request, it can just send the log info along with its current broker epoch.

    • If the replica fails after it responds to the GetReplicaLogInfo request

      • If the controller receives the new broker registration, the controller can reject the response because the broker epoch in the request mismatches with the broker registration.

      • Otherwise, the replica may become the leader but will be fenced later when it registers.

  • Controller failover.

    • The controller does not store anything in the metadata log, every controller failover will result in a new unclean recovery.

Other

  1. The kafka-leader-election.sh tool will be upgraded to allow manual leader election.

    1. It can directly select a leader.

    2. It can trigger an unclean recovery for the replica with the longest log in either Proactive or Balance mode.

  2. Configs to add
    1. unclean.recovery.strategy. Described in the above section. Balanced is the default value. 
    2. unclean.recovery.Enabled. True for enabling the unclean recovery. False otherwise. False is the default value.
    3. unclean.recovery.timeout.ms. The time limits of waiting for the replicas' response during the Unclean Recovery. 5 min is the default value.
  3. For a better user experience, the unclean.recovery.strategy and unclean.leader.election.enable will be converted if unclean.recovery.Enabled is changed.
    1. unclean.recovery.Enabled from false to true

      unclean.leader.election.enableunclean.recovery.strategy 
      falseBalanced
      trueProactive


    2. unclean.recovery.Enabled from true to false

      unclean.recovery.strategy unclean.leader.election.enable
      Proactivetrue
      Balancedfalse
      Manualfalse


Public Interfaces

We will deliver the KIP in phases, so the API changes are also marked coming with either ELR or Unclean Recovery.

...