Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

At this point, all of the brokers are running in ZK mode and their broker-controller communication channels operate as they would with a ZK controller. The ZK brokers will learn about this new controller by receiving an UpdateMetadataRequest from the new KRaft controller. From a broker’s perspective, the controller looks and behaves like a normal ZK controller.

...

In order to ensure consistency of the metadata, we must stop making any writes to ZK while we are migrating the data. This is accomplished by forcing the new KRaft controller to be the active ZK controller by forcing a write to the "/controller" and "/controller_epoch" ZNodes.

Broker Migration

Following the migration of metadata and controller leadership to KRaft, the brokers are restarted one-by-one in KRaft mode. While this rolling restart is taking place, the cluster will be composed of both ZK and KRaft brokers. 

...

There is likely no reasonable way to put a limit on how long a cluster stays in a mixed state since rolling restarts for large clusters may take several hours. We It is also allow possible for the operator to revert back to ZK during this time.

...

Once the cluster has been fully upgraded to KRaft mode, the controller will still be running in migration mode . The operator still has a chance and making dual writes to KRaft and ZK. Since the data in ZK is still consistent with that of the KRaft metadata log, it is still possible to revert back to ZK.

The time that the cluster is running all KRaft brokers/controllers, but still running in migration mode, is effectively unbounded.

Once the operator has decided to commit to KRaft mode, the final step is to restart the controller quorum and take it out of migration mode by unsetting setting kafka.metadata.migration.enableto "false" (or unsetting it). Once the controller leaves migration mode, it will no longer perform writes to ZK and it will disable its special ZK handling of ZK RPCs.

At this point, the cluster is fully migrated and is running in KRaft mode. A rollback to ZK is still possible after finishing finalizing the migration, but it must be done offline and it will cause metadata loss (which can also cause partition data loss).

...

The ZK migration logic will need to deal with asynchronous topic deletions when migrating data. Normally, the ZK controller will complete these asynchronous deletions via TopicDeletionManager. If the KRaft controller takes over before a deletion has occurred, we will need to complete the deletion as part of the ZK to KRaft state migration. Once the migration is complete, we will need to finalize the deletion in ZK so that the state is consistent.

Failure Modes

Rollback to ZK

As mentioned above, it should be possible for the operator to rollback to ZooKeeper at any point in the migration process prior to taking the KRaft controllers out of migration mode. The procedure for rolling back is to reverse the steps of the migration that had been completed so far. 

  • Brokers should be restarted one by one in ZK mode
  • The KRaft controller quorum should be cleanly shutdown
  • Operator can remove the persistent "/controller" and "/controller_epoch" nodes allowing for ZK controller election to take place

A clean shutdown of the KRaft quorum is important because there may be uncommitted metadata waiting to be written to ZooKeeper. A forceful shutdown could let some metadata be lost, potentially leading to data loss.

Failure Modes

There are a few failure scenarios to consider during the migration. The KRaft controller can crash while initially copying the data from ZooKeeper, the controller can crash some time after the initial migration, There are a few failure scenarios to consider during the migration. The KRaft controller can crash while initially copying the data from ZooKeeper, the controller can crash some time after the initial migration, and the controller can fail to write new metadata back to ZK.

...

It is also possible for a write to ZK to fail. In this case, we will want to stop making updates to the metadata log to avoid unbounded lag between KRaft and ZooKeeper. Since ZK brokers will be reading data like ACLs and dynamic configs from ZooKeeper, we should limit the amount of divergence between ZK and KRaft brokers by setting a bound on the amount of lag between KRaft and ZooKeeper.

Test Plan

limit the amount of divergence between ZK and KRaft brokers by setting a bound on the amount of lag between KRaft and ZooKeeper.


Test Plan

In addition to basic "happy path" tests, we will also want to test that the migration can tolerate failures of brokers and KRaft controllers. We will also want to have tests for the correctness of the system if ZooKeeper becomes unavailable during the migration. Another class of tests for this process is metadata consistency at the broker level. Since we are supporting ZK and KRaft brokers simultaneously, we need to ensure their metadata does not stay inconsistency for very long.Describe in few sentences how the KIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

Rejected Alternatives

Offline Migration

...