You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Status

Current stateUnder Discussion

Discussion thread: TBD

JIRA: KAFKA-7408

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

While electing a leader, if none of the in-sync replicas is alive, the controller elects a replica that was not a part of the in-sync replica set. Such a leader is called an unclean leader. Since this replica was not a part of the last known in-sync replica set, it is associated with data loss in a partition.
Currently, there is no way for brokers to tell whether a given leader was an unclean leader at the time of its election. This information can be useful at the broker in order to invoke the appropriate data loss handling routine. One such scenario can be handling of data loss in a partition that is part of a transaction. Another scenario is when a partition holds some kind of metadata and any data loss in this partition further requires an automated or manual intervention.

Changed Interfaces

This KIP will send the IsUncleanLeader boolean in the LeaderAndIsrRequest under LeaderAndIsrPartitionState commonStructs array, like so:

{ "name": "IsUncleanLeader", "type": "bool", "versions": "4+", "default": "false", "tag": 10002,
        "taggedVersions": "4+", "ignorable": true, "about": "Whether the elected leader is unclean." },

Brokers will use the AlterISR RPC to set the IsUncleanLeader to false

Proposed Changes

This KIP proposes to append a boolean state to the LeaderAndIsr state maintained at zookeeper. The new boolean state will be called IsUncleanLeader. When set to true, it will signify that the current leader was elected as an unclean leader, false otherwise. The flag can be maintained according to the following rules:

Controller will set it to true when it makes an unclean leader election.

Controller will set it to false when it elects an in-sync replica as the leader.

Leader replica may set it to false once it has processed an appropriate data loss handling routine.

All other operations that mutate the LeaderAndIsr state like expandIsr/shrinkIsr retain the state as-is.

Controller sends a LeaderAndIsrRequest to each replica when it elects a new leader. This KIP proposes to send the IsUncleanLeader flag to the replicas, along with rest of the LeaderAndIsrRequest data. Leader of a partition can use this information to invoke a data loss handling routine.

Once the partition leader has handled the IsUncleanLeader flag in the incoming LeaderAndIsrRequest, it may get the controller to modify its LeaderAndIsr state to set this flag to false.

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  • If we are changing behavior how will we phase out the older behavior?
  • If we need special migration tools, describe them here.
  • When will we remove the existing behavior?

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

  • No labels