Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Reassignments especially for large topic/partition is costly.  In some case, the performance of the Kafka cluster can be severely impacted when reassignments are kicked off.   There should be a fast, clean, safe way to cancel and rollback the pending reassignments.   e.g.  original replicas replicas [1,2,3],  new replicas [4,5,6],   causing performance impact on Leader 1,  the reassignment should be able to get cancelled immediately and reverted back to original replicas [1,2,3],  and dropping the new replicas. 
  2. Each batch of reassignments takes as long as the slowest partition; this slowest partition prevents other reassignments from happening.   This can be happening even in the case submitting the reassignments by grouping similar size topic/partitions into each batch. How to optimally group reassignments into one batch for faster execution and less impact to the cluster is beyond the discussion in this KIP. 
  3. The ZooKeeper-imposed limit of 1MB on znode size places an upper limit on the number of reassignments that can be done at a given time. Note that practically in real production environment, it's better to do reassignments in batches with reasonable number of reassignments in each batch.  Large number of reassignments tends to cause higher Producer latency.  Between batches,  proper staggering, throttling is recommended.  

...

  • Cancel all pending reassignments currently in /admin/reassign_partitions and revert them back to their original replicas.

  • Adding more partition reassignments, while some are still in-flight.  Even though in the original design of the reassign tool, the intent was for the znode (/admin/reassign_partitions) not to be updated by the tool unless it was empty,  there are user requests to support such feature,  e.g. KAFKA-7854  This is listed in "Planned Future Changes" Section and maybe be implemented in another KIP
  • Development of an AdminClient API which supported the above features.

...