Table of Contents

Status

Current state: DiscussionAdopted

Discussion thread: here

JIRA:

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	KAFKA-8834

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	KAFKA-8835

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	KAFKA-9059

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

While a reassignment is in progress, the number of replicas for a partition being reassigned temporarily increases beyond the replication factor. Once all new replicas are in the ISR, the old replicas are removed and the number of replicas again matches the replication factor. Until that point, however, the partition is treated as under-replicated both from the perspective of metrics and from the topic command utility. This is misleading because the partitions may satisfy the required replication factor throughout the reassignment. This has two major drawbacks:

URPs cannot easily be used for alerting because they are expected during a reassignment. This can obscure actual replication problems while a reassignment is in progress.
We cannot easily isolate the reassignment load with a throttle. Kafka supports replication throttles which exclude ISR traffic, but if there is an unexpected URP during a reassignment, the formerly in-sync replica will get hit with the throttle. This not only makes it more difficult to rejoin the ISR, it takes traffic from the reassignment.

In this KIP, we propose to distinguish the URPs caused by reassignment.

Proposed Changes

The problem at the moment is that only the controller knows about the reassignment. Partition leaders just see a single replica set. We propose to have the controller propagate the reassignment state to the leaders. We will distinguish between the current set of replicas and the impending set of replicas. The impending replica set will contain the new replica assignment while the reassignment is in progress.

Public Interfaces

Request APIs

We will modify the UpdateMetadata and the LeaderAndIsr request APIs to allow the controller to propagate the new reassignment to the leaders. The new LeaderAndIsr request schema is given below:

Code Block

linenumbers	true

LeaderAndIsrRequest => ControllerId ControllerEpoch [PartitionState] [LiveLeader]
  ControllerId => INT32
  ControllerEpoch => INT32
  PartitionState => TopicName PartitionId ControllerEpoch LeaderId LeaderEpoch ISR ZkVersion ActiveReplicas ImpendingReplicas IsNew
    TopicName => STRING
    PartitionId => INT32
    ControllerEpoch => INT32
    LeaderId => INT32
    LeaderEpoch => INT32
    IsNew => BOOLEAN
    ZkVersion => INT32
    ISR => [INT32]
    CurrentReplicas => [INT32]     // New
    ImpendingReplicas => [INT32]  // New

Similar changes will be made to the UpdateMetadata request.

Code Block

linenumbers	true

UpdateMetadataRequest => ControllerId ControllerEpoch [PartitionState] [LiveLeader]
  ControllerId => INT32
  ControllerEpoch => INT32
  PartitionState => TopicName PartitionId ControllerEpoch LeaderId LeaderEpoch ISR ZkVersion ActiveReplicas ImpendingReplicas
    TopicName => STRING
    PartitionId => INT32
    ControllerEpoch => INT32
    LeaderId => INT32
    LeaderEpoch => INT32
    ZkVersion => INT32
    ISR => [INT32]
    CurrentReplicas => [INT32]     // New
    ImpendingReplicas => [INT32]  // New

The response schemas for both APIs will match the previous version.

Metrics

new replicas are trying to catch up and are not in the ISR. The broker considers these partitions "under-replicated" even if the desired replication factor is always satisfied. This is misleading and makes URP metrics difficult to use for alerts. In KIP-455, we gave the leader a way to detect a reassignment. Specifically, the LeaderAndIsr request now has a separate field for the replicas which are being added and those that are being removed. This allows us to compute a more useful metric value.

Proposed Changes

We will change the semantics of the "UnderReplicated" metric to taking into account the AddingReplicas. Specifically, we will use the following formula:

Code Block
isUnderReplicated == size(original assigned replicas) - size(isr) > 0

We count a partition as under-replicated if the current isr is smaller than the size of the current replica set. This allows us to count AddingReplicas which makes this metric consistent with UnderMinIsr criteria. Note that a reassignment may change the number of replicas, but URP satisfaction will not take this into account until the reassignment is complete.

Similarly, we will change the behavior of the kafka topic command so that `--under-replicated-partitions` returns results consistent with the change above. Because the adding/removing replicas are not visible from the Metadata API, we will use the new ListReassignment API.

Additionally, we are adding a couple new metrics to track the progress of an active reassignment. These are described below.

Public Interfaces

As described above, this KIP changes the semantics of `kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions`. Replicas which are being added as part of a reassignment will not count toward this value.

We will also add some additional metrics to improve monitoring for reassignments. The table below shows all of the changes from this KIP.

Metric	Is New	Type	Includes Current Assigned Replicas	Includes Reassigning Replicas
kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions	No	Gauge	Yes	No
kafka.server:type=ReplicaManager,name=ReassigningPartitions	Yes	Gauge	No	Yes
kafka.server:type=ReplicaManager,name=ReassignmentMaxLag	Yes	Gauge	No	Yes
kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesOutPerSec	Yes	Meter	No	Yes
kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesInPerSec	Yes	Meter	No	Yes

Note that the `ReassignmentBytesOutPerSec` and `ReassignmentBytesInPerSec` meters are broker-level metrics. We are not proposing any topic-level metrics for tracking reassignment progress.

ReassignmentMaxLag will be implemented separately as it requires some more consideration. JIRA is linked on the top of the KIPWe will change the semantics of the "UnderReplicated" metric to count only the partitions which are under-replicated from the perspective of the active replica set. We will add a new metric "ReassignedCount" which tracks the number of replicas which are currently being reassigned.

Compatibility, Deprecation, and Migration Plan

...

Space shortcuts

Child pages

Versions Compared

Old Version 3

New Version Current

Key

Status

Proposed Changes

Public Interfaces

Request APIs

Metrics

Proposed Changes

Public Interfaces

Compatibility, Deprecation, and Migration Plan

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 3

New Version Current

Key

Status

Proposed Changes

Public Interfaces

Request APIs

Metrics

Proposed Changes

Public Interfaces

Compatibility, Deprecation, and Migration Plan