You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Status

Current state: Draft

Discussion thread:

JIRA:

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

To complete the plan for KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum, we need a way to migrate Kafka clusters from a ZooKeeper quorum to a KRaft quorum. This must be done without impact to partition availability and with minimal impact to operators and client applications. 

In order to give users more confidence about undertaking the migration to KRaft, we will allow a rollback to ZooKeeper until the final step of the migration. This is accomplished by writing two copies of the metadata during the migration – one to the KRaft quorum, and one to ZooKeeper.

This KIP defines the behavior and set of new APIs for the “bridge release” as first mentioned in KIP-500. 



Public Interfaces

New metadata.version (IBP)

A new metadata.version will be used for a few things in this design.

All brokers must be running this metadata.version before the migration can begin. 

Migration-mode configuration

A new “kafka.metadata.migration.enable” config will be added for the broker and controller. Its default will be “false”. Setting this config to “true” on the brokers is a prerequisite to starting the migration. Setting this to "true" on the KRaft controllers is the trigger for starting the migration (more on that below).

MigrationCheck RPC

Brokers will use the new metadata.version to enable a new MigrationCheck RPC. This RPC will be used by the KRaft controller to determine if the cluster is ready to be migrated. The response will include the cluster ID and a boolean indicating if the migration mode config has been enabled statically on this broker.

The purpose of this RPC is to signal that a broker is able to be migrated. When the KRaft controller begins the migration process, it will first check that the live brokers are able to be migrated.

Request:

{
  "apiKey": TBD,
  "type": "request",
  "name": "MigrationCheckRequest",
  "validVersions": "0",
  "flexibleVersions": "0+",
  "fields": [ ]
}

Response:

{
  "apiKey": TBD,
  "type": "response",
  "name": "MigrationCheckResponse",
  "validVersions": "0",
  "flexibleVersions": "0+",
  "fields": [ 
    {"name": "clusterId": "type": "uuid", "versions": "0+"},
    {"name": "configEnabled": "type": "boolean", "versions": "0+"}
  ]
}

Migration State ZNode

As part of the propagation of KRaft metadata back to ZooKeeper while in dual-write mode, we need to keep track of what has been synchronized. A new ZNode will be introduced to keep track of which KRaft record offset has been written back to ZK. This will be used to recover the synchronization state following a KRaft controller failover.

ZNode /migration

{
  "lastOffset": 100,
  "lastTimestamp": "2022-01-01T00:00:00.000Z",
  "kraftControllerId": 3000,
  "kraftControllerEpoch": 1
}


 

Proposed Changes

Describe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  • If we are changing behavior how will we phase out the older behavior?
  • If we need special migration tools, describe them here.
  • When will we remove the existing behavior?

Test Plan

Describe in few sentences how the KIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

  • No labels