Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add section on detecting ZK brokers

...

A new set of nodes will be provisioned to host the controller quorum. These controllers will be started with zookeeper.metadata.migration.enable set to “true”. Once the quorum is established and a leader is elected, the active controller will check that the whole quorum is ready to begin the migration. This is done by examining the new tagged field on ApiVersionsResponse that is exchanged between controllers. Following this, the controller will examine the state of the ZK broker registrations determine the set of extant ZK brokers and wait for incoming BrokerRegistration requests (see section on ZK Broker Presence). Once all known ZK brokers have registered with the KRaft controller (and they are in a valid state) the migration process will begin.

...

Once the metadata migration is complete, the KRaft controller will begin operating normally.

ZK Broker Presence

When the KRaft controller comes up in migration mode, it will wait for all known ZK brokers to register themselves before starting the migration. The problem with this is we cannot know precisely what ZK brokers exist. The broker registrations in ZK are ephemeral and only show the brokers that are currently alive. If an operator had the brokers offline and started a migration, this would lead the controller to think no brokers exist. To improve on this, we can add a heuristic based on the cluster metadata to better capture the full set of ZK brokers. If we look at the topic assignments and configurations, we can calculate a set of brokers which have partitions assigned to them or have a dynamic config. This approach is still imperfect since brokers could be offline and have no assignments, but it will at least prevent any partition unavailability due to a broker running old software and not being able to participate in the migration.

AdminClient, MetadataRequest, and Forwarding

...