Comment: Removed MigrationCheck RPC, added UMR version bump, add broker registration JSON bump, clarified many things from discussion on mailing list


Current state: In Discussion

Discussion thread:


serverASF JIRA


MBean nameDescription

An enumeration of: ZooKeeper (1), Dual (2), KRaft (3). Each broker reports this.

kafka.controller:type=KafkaController,name=Features,feature={feature},level={level}The finalized set of features with their level as seen by the controller. Used to help operators see the cluster's current metadata.version
kafka.controller:type=KafkaController,name=MigrationStateAn enumeration of the possible migration states the cluster can be in. This is only reported by the active controller. The "ZooKeeper" and "MigrationEligible" states are reported by the ZK controller, while the remaining states are reported by the KRaft controller.
kafka.controller:type=KafkaController,name=ZooKeeperWriteBehindLagThe amount of lag in records that ZooKeeper is behind relative to the highest committed record in the metadata log. This metric will only be reported by the active KRaft controller.
kafka.controller:type=KafkaController,name=ZooKeeperBlockingKRaftMillisThe number of milliseconds a write to KRaft has been blocked due to lagging ZooKeeper writes. This metric will only be reported by the active KRaft controller.

MetadataVersion (IBP)

A new metadata.version will be used new MetadataVersion in the 3.4 line will be added. This version will be used for a few things in this design.


All brokers must be running this metadata.version before at least this MetadataVersion before the migration can begin.  


ZK brokers will specify their MetadataVersion using the as usual. The KRaft controller will bootstrap with the same MetadataVersion (which is stored in the metadata log as a feature flag – see KIP-778: KRaft to KRaft Upgrades).


A new “kafka.metadata.migration.enableA new “kafka.metadata.migration.enable” config will be added for the ZK broker and KRaft controller. Its default will be “false”. Setting this config to “true” on the brokers each broker is a prerequisite to starting the migration. Setting this to "true" on the KRaft controllers is the trigger for starting the migration (more on that below).

Setting this to "true" (or "false") on a KRaft broker has no affect.

ZK Broker Registration JSON

In order to inform the KRaft controller that a ZK broker is ready for migration, a new version of the broker registration JSON will be added. This new version (6) will add a kraftMigration object. The object will include properties needed by the KRaft controller to begin the migration. The usage of this new version (and field) will be gated on the MetadataVersion introduced by this KIP.

Code Block
  "apiKeyversion": TBD6,
  "typehost": "requestbroker01",
  "nameport": "MigrationCheckRequest"9092,
  "validVersionsjmx_port": "0"9999,
  "flexibleVersionstimestamp": "0+"2233345666,
  "fieldsendpoints": [ ]


Code Block
  "apiKeyrack": TBD"",
  "type"features: "response"{},
  "namekraftMigration": "MigrationCheckResponse",
 { // <-- New object
    "validVersionsisReady": "0",true
    "flexibleVersionsclusterId": "0+uKMoqJEZRSWt0uDX44O5Wg",
    "fieldsibp": [ "3.4-IV0"
    {"name": "clusterId": "type": "uuid", "versions": "0+"},
    {"name": "configEnabled": "type": "boolean", "versions": "0+"}

Operational Changes

Forwarding Enabled on Brokers

As detailed in KIP-500 and KIP-590, all brokers (ZK and KRaft) must forward administrative requests such as CreateTopics to the active KRaft controller once the migration has started. When running the new metadata.version defined in this KIP, all brokers will enable forwarding.

Migration Trigger

The migration from ZK to KRaft will be triggered by the cluster's state. To start a migration, the cluster must meet some requirements:

  1. The metadata.version is set to the version added by this KIP. This indicates the software is at a minimum version which includes the necessary logic to perform the migration.
  2. All ZK brokers have kafka.metadata.migration.enable set to “true”. This indicates an operator has declared some intention to start the migration.
  3. No brokers are offline (we will use offline replicas as a proxy for this).

Once these conditions are satisfied, an operator can start a KRaft quorum with kafka.metadata.migration.enable set to “true” to begin the migration.

By utilizing configs and broker/controller restarts, we follow a paradigm that Kafka operators are familiar with.

Migration Overview

Here is a state machine description of the migration. 








The cluster is in ZooKeeper mode




The cluster has been upgraded to a minimum software version and has set the necessary static configs




The KRaft quorum has been started




ZK state has been migrated, controller is in dual-write mode, brokers are being restarted in KRaft mode




All of the brokers have been restarted in KRaft mode, controller still in dual-write mode




The cluster is in KRaft mode


In addition to checking if a broker is ready for migration, these new properties are used by the KRaft controller to verify that the brokers and new KRaft controllers have valid configurations. 


A new RPC version will be added which adds the field KRaftControllerId. This field will point to the active KRaft controller. If this field is set, the ControllerId field should be -1.

Code Block
  "apiKey": 6,
  "type": "request",
  "listeners": ["zkBroker"],
  "name": "UpdateMetadataRequest",
  "validVersions": "0-8",  <-- New version 8
  "flexibleVersions": "6+",
  "fields": [
    { "name": "ControllerId", "type": "int32", "versions": "0+", "entityType": "brokerId",
      "about": "The controller id." },
--> { "name": "KRaftControllerId", "type": "int32", "versions": "8+", "entityType": "brokerId",
      "about": "The KRaft controller id, used during migration." }, <-- New field
    { "name": "ControllerEpoch", "type": "int32", "versions": "0+",
      "about": "The controller epoch." },

Operational Changes

Forwarding Enabled on Brokers

As detailed in KIP-500 and KIP-590, all brokers (ZK and KRaft) must forward administrative requests such as CreateTopics to the active KRaft controller once the migration has started. When running the new metadata.version defined in this KIP, all brokers will enable forwarding.

Additional ZK Broker Configs 

To support connecting to a KRaft controller for requests such as AlterPartitions, the ZK brokers will need additional configs

  • controller.quorum.voters: comma-separate list of "node@host:port" (the same as KRaft brokers would set)
  • controller.listener.names: a comma-separated list of listeners used by the controller
  • Corresponding entries in for the listeners given in controller.listener.names

Broker Registration

While running in migration mode, we must synchronize broker registration information bidirectionally between ZK and KRaft. 


Since we require controller forwarding for this KIP, we can use the KRaft approach of returning a random broker (ZK or KRaft) as the ControllerId for clients via MetadataResponse and rely on forwarding for write operations.

HoweverFor inter-broker requests such as AlterPartitions and ControlledShutdown, we do not want to add the overhead of forwarding for inter-broker requests such as AlterPartitions and ControlledShutdown. In the UpdateMetadataRequest sent by the KRaft controller to the ZK brokers, the ControllerId will point to the active controller which will be used for the inter-broker requestsso we'll want to include the actual controller in the UpdateMetadataRequest. However, we cannot simply include the KRaft controller as the ControllerId. The ZK brokers connect to a ZK controller by using the "" config and the node information from LiveBrokers in the UpdateMetadataRequest. For connecting to a KRaft controller, the ZK brokers will need to use the "controller.listener.names" and "controller.quorum.voters" configs. To allow this, we will use the new KRaftControllerId field in UpdateMetadataRequest.

Unavailable ZooKeeper

While in the dual-write mode, it is possible for a write to ZK to fail. In this case, we will want to stop making updates to the metadata log to avoid unbounded lag between KRaft and ZooKeeper. Since ZK brokers will be reading data like ACLs and dynamic configs from ZooKeeper, we should limit the amount of divergence between ZK and KRaft brokers by setting a bound on the amount of lag between KRaft and ZooKeeper.

Incompatible Brokers

At any time during the migration, it is possible for an operator to bring up an incompatible broker. This could be a new or existing broker. In this event, the KRaft controller will see the broker registration in ZK, but it will not send it any RPCs. By refusing to send it UpdateMetadata or LeaderAndIsr RPCs, this broker will be effectively fenced from the rest of the cluster. It is also possible for a write to ZK to fail. In this case, we will want to stop making updates to the metadata log to avoid unbounded lag between KRaft and ZooKeeper. Since ZK brokers will be reading data like ACLs and dynamic configs from ZooKeeper, we should limit the amount of divergence between ZK and KRaft brokers by setting a bound on the amount of lag between KRaft and ZooKeeper.

Test Plan

In addition to basic "happy path" tests, we will also want to test that the migration can tolerate failures of brokers and KRaft controllers. We will also want to have tests for the correctness of the system if ZooKeeper becomes unavailable during the migration. Another class of tests for this process is metadata consistency at the broker level. Since we are supporting ZK and KRaft brokers simultaneously, we need to ensure their metadata does not stay inconsistency for very long.
