Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Reassignments especially for large topic/partition is costly.  In some case, the performance of the Kafka cluster can be severely impacted when reassignments are kicked off.   There should be a fast, clean, safe way to cancel and rollback the pending reassignments.   e.g.  original replicas [1,2,3],  new replicas [4,5,6],   causing performance impact on Leader 1,  the reassignment should be able to get cancelled immediately and reverted back to original replicas [1,2,3],  and dropping the new replicas. 
  2. Each batch of reassignments takes as long as the slowest partition; this slowest partition prevents other reassignments from happening.   This can be happening even in the case submitting the reassignments by grouping similar size topic/partitions into each batch. How to optimally group reassignments into one batch for faster execution and less impact to the cluster is beyond the discussion in this KIP.   This is addressed in the Planned Future Changes section and may be implemented in another KIP. 

This change would enable 

...

Cancel all pending reassignments currently in /admin/reassign_partitions and revert them back to their original replicas.

  1. Currently, the reassignment operations are still communicated directly with the Zookeeper.   Other admin types of operation like create/delete topics, etc. are moving to the RPC based KIP-4 wire protocol.   By moving from interacting directly with Zookeeper to  RPC,  it offers the user the recommended path and discourages directly modifying the Zookeeper nodes.  This will pave the way to lock down Zookeeper security by ACLs,  that only brokers need to communicate with ZK.


This change would enable 

  • Cancel all pending reassignments currently in /admin/reassign_partitions and revert them back to their original replicas.

  • Development of an AdminClient API which supported the above features.  Change the current administrative APIs to go through  RPC instead of Zookeeper. 

Public Interfaces

...

Public Interfaces

Strictly speaking this is not a change that would affect any public interfaces (since ZooKeeper is not considered a public interface, and it can be made in a backward compatible way), however since some users are known to operate on the /admin/reassign_partitions znode directly,  this could break in future versions of Kafka  (e.g. as reported in KAFKA-7854),  and such operations should be discouraged.  

...

No Format
$ zkcli -h kafka-zk-host1 ls /kafka-cluster/admin/
[u'reassign_partitions',
 u'delete_topics']

# Current pending reassignment(s)
$ zkcli -h kafka-zk-host1 get /kafka-cluster/admin/reassign_partitions
('{"version":1,"partitions":[{"topic":"test_topic","partition":25,"replicas":[1,2,4],"original_replicas":[1,2,3]}]}', ZnodeStat(czxid=17180484637, mzxid=17180484641, ctime=1549498790668, mtime=1549498790680, version=1, cversion=0, aversion=0, ephemeralOwner=0, dataLength=148, numChildren=0, pzxid=17180484637))

$ /# Cancel the pending reassignments.  and remove the throttle as well. 
$ /usr/lib/kafka/bin/kafka-reassign-partitions.sh  --zookeeper kafka-zk-host1/kafka-cluster --cancel
Rolling back the current pending reassignments Map(test_topic-25 -> Map(replicas -> Buffer(1, 2, 4), original_replicas -> Buffer(1, 2, 3)))
Successfully submitted cancellation of reassignments.

#The Thiscancelled ispending justreassignments forthrottle illustrationwas purposeremoved.
Please run --verify Into reality,have the cancellation ofprevious reassignments should (not just the cancelled reassignments in progress) throttle removed.

# This is just for illustration purpose.  In reality, the cancellation of reassignments should be pretty quick. 
# The below listing of /admin might not even show cancel_reassignment_in_progress & reassign_partitions
$ zkcli -h kafka-zk-host1 ls /kafka-cluster/admin/
[u'cancel_reassignment_in_progress',
 u'reassign_partitions',
 u'delete_topics']

# After reassignment cancellation is complete.  The ZK node  /admin/cancel_reassignment_in_progress  & /admin/reassign_partitions are gone.
$ zkcli -h kafka-zk-host1 ls /kafka-cluster/admin/
[u'delete_topics']

...

u'cancel_reassignment_in_progress',
 u'reassign_partitions',
 u'delete_topics']

# After reassignment cancellation is complete.  The ZK node  /admin/cancel_reassignment_in_progress  & /admin/reassign_partitions are gone.
$ zkcli -h kafka-zk-host1 ls /kafka-cluster/admin/
[u'delete_topics']


If the pending reassignments have throttle,  the throttle will be removed after the reassignments are cancelled.   However for the reassignments already completed,  the user would need to remove their throttle by running the kafka-reassign-partitions.sh --verify

Skip Reassignment Cancellation Scenarios

There are a couple scenarios that the Pending reassignments in /admin/reassign_partitions can not be cancelled / rollback.   

  1. If the "original_replicas"  is missing for the topic/partition in /admin/reassign_partitions .  In this case, the pending reassignment cancelled will be skipped.  Because there is no way to reset to the original replicas.  The reasons this can happened  could be: 
    1. if either the user/client is tampering /admin/reassign_partitions directly, and does not have the "original_replicas" for the topic
    2. if the user/client is using incorrect versions of the admin client to submit for reassignments.   The Kafka software should be upgraded not just for all the brokers in the cluster.  but also on the host that is used to submit reassignments. 

  2. If all the "original_replicas" brokers are not in ISR,  and some brokers in the "new_replicas" are not offline for the topic/partition in the pending reassignments.   In this case, it's better to skip this topic's pending reassignment  cancellation/rollback,  otherwise, it will become offline.  However,  if all the brokers in "original_replicas" are offline  AND  all the brokers in "new_replicas" are also offline for this topic/partition,  then the cluster is in such a bad state, the topic/partition is currently offline anyway,  it will cancel/rollback this topic pending reassignments back to the "original_replicas".  

Planned Future Changes 

New reassignments while existing reassignments in-flight  

...