Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here is a state machine description of the migration. There will likely be more internal states that the controller uses, but these four will be exposed as the ZkMigrationState metric.


State

Enum

Description

None0This cluster started out as KRaft and was not migrated.

MigrationIneligible

1

The brokers and controllers do not meet the migration criteria. The cluster is operating in ZooKeeper mode.

MigratingZkData

2

The controller is copying data from ZooKeeper into KRaft.

DualWriteMetadata

3

The controller is in KRaft mode making dual writes to ZooKeeper.

MigrationFinalized

4

The cluster has been migrated to KRaft mode.

...

The metadata copied from ZK will be encapsulated in a single metadata transaction (KIP-868). A MigrationRecord ZkMigrationRecord will also be included in this transaction. 

...

Once the operator has decided to commit to KRaft mode, the final step is to restart the controller quorum and take it out of migration mode by setting zookeeper.metadata.migration.enable to "false" (or unsetting it). The active controller will only finalize the migration once it detects that all members of the quorum have signaled that they are finalizing the migration (again, using the tagged field in ApiVersionsResponse). Once the controller leaves migration mode, it will write a MigrationRecord ZkMigrationRecord to the log and no longer perform writes to ZK. It will also disable its special handling of ZK RPCs.

...

Since these two versions contain the same data, but with different field names, we can simply support v0 and v1 in KRaft brokers and avoid modifying the file on disk. By leaving this file unchanged, we better facilitate a downgrade to ZK during the migration. Once the controller has completed the migration and written the final ZkMigrationRecord, the brokers can rewrite their meta.properties files as v1 in their log directories.

Rollback to ZK

As mentioned above, it should be possible for the operator to rollback to ZooKeeper at any point in the migration process prior to taking the KRaft controllers out of migration mode. The procedure for rolling back is to reverse the steps of the migration that had been completed so far. 

...

If a migration has been started, but a KRaft controller is elected that is misconfigured (does not have zookeeper.metadata.migration.enable or ZK configs) this controller should resign. When replaying the metadata log during its initialization phase, this controller can see that a migration is in progress by seeing the initial MigrationRecordZkMigrationRecord. Since it does not have the required configs, it can resign leadership and throw an error.

If a migration has been finalized, but the KRaft quroum comes up with zookeeper.metadata.migration.enable, we must not re-enter the migration mode. In this case, while replaying the log, the controller can see the second MigrationRecord ZkMigrationRecord and know that the migration is finalized and should not be resumed. This should result in errors being thrown, but the quorum can continue operating as normal.

...