Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The brokers will communicate to the controller when it has finished recovering from unclean leader election. An important invariant that the partition leader must satisfied is that the ISR size is 1 while the "is unclean" field is true. This means that the leader will not allow followers to join the ISR until it has recovered from the unclean leader election.

While the leader is recovering from an unclean leader election it will return a NOT_LEADER_OR_FOLLOWER error for the FETCH, PRODUCE, LIST_OFFSETS, DELETE_RECORDS and OFFSET_FOR_LEADER_EPOCH requests.

One of the ways Kafka is going to use this feature is to abort all pending transaction when recovering from unclean leader election. See

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-7408
for more details.

...

If the ZooKeeper controller supports the feature then when performing an unclean leader election it will write "true" in ZK and it will set "false" in the LeaderAndIsr request. If the broker doesn't support this feature, the "is unclean" field in the LeaderAndIsr request will be ignored and behave as it currently does. When the broker sends the AlterIsr request to the controller, the controller will interpret the "is unclean" fields as "false" because the default value is false. A similar logic applies to the KRaft controller.

Clients

With this KIP the requests FETCH, PRODUCE, LIST_OFFSETS, DELETE_RECORDS and OFFSET_FOR_LEADER_EPOCH will return a NOT_LEADER_OR_FOLLOWER error for any topic partition for which the leader is recovering. This is backward compatible because the clients will retry this error.

For FETCH requests, the replicas will handle this error by backing off by "replica.fetch.backoff.ms". The consumers will handle this error by queuing a full metadata request for the next metadata request interval.

For PRODUCE requests,  the producers will re-queue the request if it is within the retry window.

Rejected Alternatives

An alternative solution is to store the leader epoch when the unclean leader election was performed instead of storing a boolean in the "is unclean" field. The topic partition would perform unclean recovery when the unclean leader epoch is equal to the current leader epoch. One issue with this solution is that the controller changes the leader epoch when the leader goes offline. This means that the controller would have to also reset the unclean leader epoch when the leader goes offline.