Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If none of the in-sync replicas are alive, the controller allows the user to elect a replica that was not a part of the in-sync replica set using the unclean leader election strategy. Since this new leader replica was not a part of the last known in-sync replica set, there is the possibility for data loss by deleting log records committed by the previous leader(s). In addition, the lost of these records can cause some inconsistency with other parts of the system like the transaction coordinators and the group coordinators. If the controller is able to communicate to the new topic partition leader that it was elected using unclean leader election, the new topic partition leader can coordinate this recovery and mitigate this these inconsistencies.

Proposed Changes

This KIP proposes extending the communication between the controller and brokers to include enough information so that topic partition leaders know if they were elected because of an unclean leader election. This feature needs to support both the traditional ZooKeeper controller and the more recent KRaft controller. The messages sent by the controller to the broker during leader election will be extended to include if the last leader epoch that was elected using unclean leader election. The partition leader will assume that it was elected using the unclean leader election strategy if the unclean epoch equals the leader epochIt is important to note that when the controller performs unclean leader election the ISR size is 1 and the "is unclean" field will be true. In such cases the topic partition leader will perform any necessary recovery steps.

The brokers will communicate to the controller when it has finished recovering from unclean leader election. An important invariant that the partition leader must satisfied is that the ISR size is 1 while the "is unclean" field is true. This means that the leader will not allow followers to join the ISR until it has recovered from the unclean leader election.

One of the ways Kafka is going to use this feature is to abort all pending transaction when recovering from unclean leader electionWhen the topic partition leader recovers from an unclean leader election, it will abort all all pending transaction. See

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-7408
for more details.

...

Code Block
{
  "version": 1,
  "leader": NUMBER,
  "leader_epoch": NUMBER,
  "controller_epoch": NUMBER,
  "isr" ARRAY of NUMBERNUMBERS,
  "is_unclean": BOOLEAN // New property
}

...