Status

Current state: Draft

Discussion thread: TBD

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

While electing a leader, if none of the in-sync replicas is alive, the controller elects a replica that was not a part of the in-sync replica set. Such a leader is called an unclean leader. Since this replica was not a part of the last known in-sync replica set, it is associated with data loss in a partition.
Currently, there is no way for brokers to tell whether a given leader was an unclean leader at the time of its election. This information can be useful at the broker in order to invoke the appropriate data loss handling routine. One such scenario can be handling of

LeaderAndIsr

data loss in a partition that is part of a transaction.

Another scenario is when a partition holds some kind of metadata and any data loss in this partition further requires an automated or manual intervention.

Changed Interfaces

LeaderAndIsr data to be saved at zoo keeper will add a boolean flag to indicate whether the current leader is unclean or not.

case class LeaderAndIsr(leader: Int,
                        leaderEpoch: Int,
                        isr: List[Int],
                        zkVersion: Int,
                        isUnclean: Boolean)

This boolean information will be sent to the brokers in the LeaderAndIsrRequest under LeaderAndIsrPartitionState commonStructs array, like so:

{ "name": "IsUncleanLeader", "type": "bool", "versions": "4+", "default": "false", "tag": 0,
        "taggedVersions": "4+", "ignorable": true, "about": "Whether the elected leader is unclean." },

Leader replica will use the AlterISR RPC to set the IsUncleanLeader state to false once it has completed the desired recovery.

Proposed Changes

This KIP proposes to append a boolean state to the LeaderAndIsr state maintained at zookeeper. The new boolean state will be called IsUncleanLeader. When set to true, it will signify that the current leader was elected as an unclean leader, false otherwise. The flag can be maintained according to the following rules:

Controller will set it to true when it makes an unclean leader election.

Controller will set it to false when it elects an in-sync replica as the leader.

Leader replica may set it to false once it has processed an appropriate data loss handling routine.

All other operations that mutate the LeaderAndIsr state like expandIsr/shrinkIsr retain the state as-is.

Controller sends a LeaderAndIsrRequest to each replica when it elects a new leader. This KIP proposes to send the IsUncleanLeader flag to the replicas, along with rest of the LeaderAndIsrRequest data. Leader of a partition can use this information to invoke a data loss handling routine.

Once the partition leader has handled the IsUncleanLeader flag in the incoming LeaderAndIsrRequest, it may get the controller to modify its LeaderAndIsr state to set this flag to false.

Compatibility, Deprecation, and Migration Plan

What impact (if any) will there be on existing users?
If we are changing behavior how will we phase out the older behavior?
If we need special migration tools, describe them here.
When will we remove the existing behavior?

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

Space shortcuts

Child pages

Status

Motivation

Changed Interfaces

Proposed Changes

Rejected Alternatives

Space shortcuts

Child pages

KIP-704: Send a hint to broker if it is an unclean leader

Status

Motivation

Changed Interfaces

Proposed Changes

Rejected Alternatives