Status
Current state: Draft
Discussion thread: TBD
JIRA: KAFKA-7408
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
While electing a leader, if none of the in-sync replicas is alive, the controller elects a replica that was not a part of the in-sync replica set. Such a leader is called an unclean leader. Since this replica was not a part of the last known in-sync replica set, it is associated with data loss in a partition.
Currently, there is no way for brokers to tell whether a given leader was an unclean leader at the time of its election. This information can be useful at the broker in order to invoke the appropriate data loss handling routine. One such scenario can be handling of
LeaderAndIsr
data loss in a partition that is part of a transaction.
Another scenario is when a partition holds some kind of metadata and any data loss in this partition further requires an automated or manual intervention.
Changed Interfaces
- LeaderAndIsr data to be saved at zoo keeper will add a boolean flag to indicate whether the current leader is unclean or not.
case class LeaderAndIsr(leader: Int,
leaderEpoch: Int,
isr: List[Int],
zkVersion: Int,
isUnclean: Boolean)
- This boolean information will be sent to the brokers in the
LeaderAndIsrRequest
underLeaderAndIsrPartitionState commonStructs
array, like so:
{ "name": "IsUncleanLeader", "type": "bool", "versions": "4+", "default": "false", "tag": 0, "taggedVersions": "4+", "ignorable": true, "about": "Whether the elected leader is unclean." },
- Leader replica will use the
AlterISR RPC
to set theIsUncleanLeader
state to false once it has completed the desired recovery.
Proposed Changes
This KIP proposes to append a boolean state to the LeaderAndIsr
state maintained at zookeeper. The new boolean state will be called IsUncleanLeader
. When set to true
, it will signify that the current leader was elected as an unclean leader, false
otherwise. The flag can be maintained according to the following rules:
Controller will set it to true
when it makes an unclean leader election.
Controller will set it to false
when it elects an in-sync replica as the leader.
Leader replica may set it to false
once it has processed an appropriate data loss handling routine.
All other operations that mutate the LeaderAndIsr
state like expandIsr
/shrinkIsr
retain the state as-is.
Controller sends a LeaderAndIsrRequest
to each replica when it elects a new leader. This KIP proposes to send the IsUncleanLeader
flag to the replicas, along with rest of the LeaderAndIsrRequest
data. Leader of a partition can use this information to invoke a data loss handling routine.
Once the partition leader has handled the IsUncleanLeader
flag in the incoming LeaderAndIsrRequest
, it may get the controller to modify its LeaderAndIsr
state to set this flag to false
.
Compatibility, Deprecation, and Migration Plan
- What impact (if any) will there be on existing users?
- If we are changing behavior how will we phase out the older behavior?
- If we need special migration tools, describe them here.
- When will we remove the existing behavior?
Rejected Alternatives
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.