Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

The current current kafka-reassign-partitons.sh tool imposes the limitation that only a single batch of partition reassignments can be in-flight, and it is not possible to cancel a reassignment that is in-flight. This has a number of consequences:

...

This change would enable

  • Adding more partition reassignments, while some are still in-flight.
  • Cancel individual partition reassignments (by reverting the reassignment to the old set of brokers)
  • Development of an AdminClient API which supported the above features.

To illustrate the last bullet, consider an AdminClient API for partition reassignment that returns a Future providing access to a ReassignmentPartitionsResult which (implicitly) includes the identity of each partitions Reassignment. Further AdminClient APIs can be added to:

  • query the status of a particular reassignment
  • list all current reassignments
  • change a current reassignment
  • scope a throttle to the duration of a reassignment

Public Interfaces

Strictly speaking this is not a change that would affect any public interfaces (since ZooKeeper is not considered a public interface, and it can be made in a backward compatible way), however since some users are known to operate on the /admin/reassign_partitions znode directly I felt it was worthwhile using the KIP process for this change.

...

The controller will also watch for changes in the contents of these znodes, and re-initiate reassignment if the content changes. This means it is possible to "cancel" the reassignment of a single partition by changing the reassignment to the old assigned replicas.Note however that the controller itself doesn't remember the old assigned replicas – it is up to the client to record the old state prior to starting a reassignment if cancellation is necessary for that client.

When the ISR includes all the replicas given as the contents of the reassignment znode,  ([10,11,15] in the example) the controller would remove the relevant child of /admin/reassignments (i.e. at the same time the  controller currently updates/removes the /admin/reassign_partitions znode).

In this way the child nodes of /admin/reassignments are precisely the replicas currently being reassigned. Moreover the ZooKeeper czxid of a child node identifies a reassignment uniquely. It is then possible to determine whether that exact reassignment is still on-going, or (using the mzxid) has been changed.

Compatibility, Deprecation, and Migration Plan

In order to retain compatibility with existing software that understands /admin/reassign_partitions the controller would be changed to create the child nodes of /admin/reassignments when the /admin/reassign_partitions znode was created.. The code to update and delete /admin/reassign_partitions would also be retained, so the only difference to a client that operates on /admin/reassign_partitions would be a slight increase in latency due to the need to create the new znodes. This compatibility behaviour could be dropped in some future version of Kafka, if that was desirable.

...