Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current state: Under DiscussionAccepted

Discussion thread: https://lists.apache.org/thread/ld2t2xkby7rpgrggqo1h344goddfdnxb

...

If none of the in-sync replicas are alive, the controller allows the user to elect a replica that was not a part of the in-sync replica set using the unclean leader election strategy. Since this new leader replica was not a part of the last known in-sync replica set, there is the possibility for data loss by deleting log records committed by the previous leader(s). In addition, the lost of loss of these records can cause some inconsistency with other parts of the system like the transaction coordinators and the group coordinators. If the controller is able to communicate to the new topic partition leader that it was elected using unclean leader election, the new topic partition leader can coordinate this recovery and mitigate these inconsistencies.

...

One of the ways Kafka is going to use this feature is to abort all pending transaction when recovering from unclean leader election. See

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-7408
for more details.

Topic Partition Follower

With KIP-392, it is possible for follower to receive FETCH request from the consumer. Follower will return a NOT_LEADER_OR_FOLLOWER error while the leader it sill recovering. This means that the follower will return this error when it receives a LEADER_AND_ISR request with RECOVERING set until it receives another request with RECOVERED. The follower will not ignore request that have the same leader epoch.

Controller

ZooKeeper Controller

...

  1. The size of the ISR is greater than 1 and the LeaderRecoveryState is RECOVERING.
  2. The LeaderRecoveryState is changing from RECOVERED to RECOVERING to RECOVERED.

If the controllers receives an AlterPartition with a version of 0, it will assume that the leader has recovered from the unclean leader election.

Response Schema

The name of the request response will be changed to AlterPartitionResponse from AlterIsrResponse. The field CurrentIsrVersion will be renamed to PartitionEpoch.

Add a property to indicate to the leader of the topic partition that it is must recover the partitionrecovery state after processing the AlterPartition request. The broker does not take the value in this field as a trigger to start recovery. That happens through either the LeaderAndIsr request or the relevant record in Kraft mode.

Code Block
  {
    "apiKey": 56,
    "type": "response",
    "name": "AlterPartitionResponse",
    "validVersions": "0-1",
    "flexibleVersions": "0+",
    "fields": [
      { "name": "ThrottleTimeMs", "type": "int32", "versions": "0+",
        "about": "The duration in milliseconds for which the request was throttled due to a quota violation, or zero if the request did not violate any quota." },
      { "name": "ErrorCode", "type": "int16", "versions": "0+",
        "about": "The top level response error code" },
      { "name": "Topics", "type": "[]TopicData", "versions": "0+", "fields": [
        { "name":  "Name", "type": "string", "versions": "0+", "entityType": "topicName",
          "about": "The name of the topic" },
        { "name": "Partitions", "type": "[]PartitionData", "versions": "0+", "fields": [
          { "name": "PartitionIndex", "type": "int32", "versions": "0+",
            "about": "The partition index" },
          { "name": "ErrorCode", "type": "int16", "versions": "0+",
            "about": "The partition level error code" },
          { "name": "LeaderId", "type": "int32", "versions": "0+", "entityType": "brokerId",
            "about": "The broker ID of the leader." },
          { "name": "LeaderEpoch", "type": "int32", "versions": "0+",
            "about": "The leader epoch." },
          { "name": "Isr", "type": "[]int32", "versions": "0+", "entityType": "brokerId",
            "about": "The in-sync replica IDs." },
          // ----- Start of properties added by this KIP -----
          { "name": "LeaderRecoveryState", "type": "int8", "versions": "1+", "default": "0",
            "about": "Indicates if the partition is recovering from an election." },
          // ----- End of properties added by this KIP ----- 
          { "name": "PartitionEpoch", "type": "int32", "versions": "0+",
            "about": "The current epoch of the partition." }
        ]}
      ]}
    ]
  }

...