You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Status

Current state: Draft

Discussion thread:

JIRA:

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

To complete the plan for KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum, we need a way to migrate Kafka clusters from a ZooKeeper quorum to a KRaft quorum. This must be done without impact to partition availability and with minimal impact to operators and client applications. 

In order to give users more confidence about undertaking the migration to KRaft, we will allow a rollback to ZooKeeper until the final step of the migration. This is accomplished by writing two copies of the metadata during the migration – one to the KRaft quorum, and one to ZooKeeper.

This KIP defines the behavior and set of new APIs for the “bridge release” as first mentioned in KIP-500. 



Public Interfaces

New metadata.version (IBP)

A new metadata.version will be used for a few things in this design.

All brokers must be running this metadata.version before the migration can begin. 

Migration-mode configuration

A new “kafka.metadata.migration.enable” config will be added for the broker and controller. Its default will be “false”. Setting this config to “true” on the brokers is a prerequisite to starting the migration. Setting this to "true" on the KRaft controllers is the trigger for starting the migration (more on that below).

MigrationCheck RPC

Brokers will use the new metadata.version to enable a new MigrationCheck RPC. This RPC will be used by the KRaft controller to determine if the cluster is ready to be migrated. The response will include the cluster ID and a boolean indicating if the migration mode config has been enabled statically on this broker.

The purpose of this RPC is to signal that a broker is able to be migrated. When the KRaft controller begins the migration process, it will first check that the live brokers are able to be migrated.

Request:

{
  "apiKey": TBD,
  "type": "request",
  "name": "MigrationCheckRequest",
  "validVersions": "0",
  "flexibleVersions": "0+",
  "fields": [ ]
}

Response:

{
  "apiKey": TBD,
  "type": "response",
  "name": "MigrationCheckResponse",
  "validVersions": "0",
  "flexibleVersions": "0+",
  "fields": [ 
    {"name": "clusterId": "type": "uuid", "versions": "0+"},
    {"name": "configEnabled": "type": "boolean", "versions": "0+"}
  ]
}

Migration State ZNode

As part of the propagation of KRaft metadata back to ZooKeeper while in dual-write mode, we need to keep track of what has been synchronized. A new ZNode will be introduced to keep track of which KRaft record offset has been written back to ZK. This will be used to recover the synchronization state following a KRaft controller failover.

ZNode /migration

{
  "lastOffset": 100,
  "lastTimestamp": "2022-01-01T00:00:00.000Z",
  "kraftControllerId": 3000,
  "kraftControllerEpoch": 1
}

Controller ZNodes

The two controller ZNodes "/controller" and "/controller_epoch" will be managed by the KRaft quorum during the migration. Rather than using ephemeral ZNodes, the KRaft controller will use a persistent ZNode for "/controller" to prevent ZK brokers from attempting to become the active controller. The "/controller_epoch" ZNode will be managed by the active KRaft controller and incremented anytime a new KRaft controller is elected.

Operational Changes

Forwarding Enabled on Brokers

As detailed in KIP-500 and KIP-590, all brokers (ZK and KRaft) must forward administrative requests such as CreateTopics to the active KRaft controller once the migration has started. When running the new metadata.version defined in this KIP, all brokers will enable forwarding.

Migration Trigger

The migration from ZK to KRaft will be triggered by the cluster's state. To start a migration, the cluster must meet two requirements:

  1. The metadata.version is set to the version added by this KIP. This indicates the software is at a minimum version which includes the necessary logic to perform the migration
  2. All ZK brokers have kafka.metadata.migration.enable set to “true”. This indicates an operator has declared some intention to start the migration

Once these conditions are satisfied, an operator can start a KRaft quorum with kafka.metadata.migration.enable set to “true” to begin the migration.

By utilizing configs and broker/controller restarts, we follow a paradigm that Kafka operators are familiar with.

Migration Overview

Here is a state machine description of the migration. 


State

Description

ZooKeeperMode

The cluster is in ZooKeeper mode

MigrationEligible

The cluster has been upgraded to a minimum software version and has set the necessary static configs

MigrationReady

The KRaft quorum has been started

MigrationActive

ZK state has been migrated, controller is in dual-write mode, brokers are being restarted in KRaft mode

MigrationFinished

All of the brokers have been restarted in KRaft mode, controller still in dual-write mode

KRaftMode

The cluster is in KRaft mode


And a state machine diagram:



Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  • If we are changing behavior how will we phase out the older behavior?
  • If we need special migration tools, describe them here.
  • When will we remove the existing behavior?

Test Plan

Describe in few sentences how the KIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

  • No labels