Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Avoid intermediate assign to ZERO in the Controller.

...

Footnote

The broker cannot run if this particular log directory is unavailable, and when configured separately it cannot host any user partitions, so there's no point in identifying it in the Controller.

Reserved UUIDs

The following UUIDs are excluded from the random pool when generating a log directory UUID:

  • UUID.UnknownDir – new Uuid(0L, 0L) – used to identify new or unknown assignments.
  • UUID.OfflineDir - new Uuid(0L, 1L) – used to represent unspecified offline directories.

Metadata records

RegisterBrokerRecord and BrokerRegistrationChangeRecord BrokerRegistrationChangeRecord will both have two new fields:

...

Although not explicitly specified in the schema, the default value for Directory is Uuid.ZEROUnknownDir, as that's the default default value for UUID types.

...

A directory assignment to Uuid.ZEROUnknownDir conveys that the log directory is not yet known, the hosting Broker will eventually determine the hosting log directory and use AssignReplicasToDirs to update this the assignment.

...

A AssignReplicasToDirs request including an assignment to Uuid.ZERO conveys OfflineDir conveys that the Broker is wanting to correct a replica assignment into a offline log directory, which cannot be identified.

...

Because the broker is proactive in communicating any log directory assignment changes to the controller, the metadata should be up to date and correct when the controller is notified of a failed log directory. However, the consequences of some partition assignment being incorrect – due to some error or race condition - can be quite damaging, as the controller might not know to update leadership for that partition, leaving it unavailable for an indefinite amount of time. So, as a fallback mechanism, when handling a runtime directory failure, the broker must verify the assignments for the newly failed partitions against the latest metadata, and for any incorrect assignments, the broker will use AlterReplicaLogDirs to assign them to UUID.Zero OfflineDir so that the controller can update leadership and ISR.

...

  • If the partition is not assigned to a log directory (refers to Uuid.ZEROUnknownDir)
    • If the partition already exists, the broker uses the new RPC — AssignReplicasToDirs — to notify the controller to change the metadata assignment to the actual log directory.
    • If the partition does not exist, the broker selects a log directory and uses the new RPC — AssignReplicasToDirs — to notify the controller to create the metadata assignment to the actual log directory.
  • If the partition is assigned to an online log directory
    • If the partition does not exist it is created in the indicated log directory.
    • If the partition already exists in the indicated log directory and no future replica exists, then no action is taken.
    • If the partition already exists in the indicated log directory, and there is a future replica in another log directory, then the broker starts the process to replicate the current replica to the future replica.
    • If the partition already exists in another online log directory and is a future replica in the log directory indicated by the metadata, the broker will replace the current replica with the future replica after making sure that the future replica is fully caught up with the current replica.
    • If the partition already exists in another online log directory, the broker uses the new RPC — AssignReplicasToDirs — to the controller to change the metadata assignment to the actual log directory. The partition might have been moved to a different log directory whilst the broker was offline. 
  • If the partition is assigned to an unknown log directory or refers to Uuid.OfflineDir
    • If there are offline log directories, no action is taken — the assignment refers to a a log directory which may be offline, we don't want to fill the remaining online log directories with replicas that existed in the offline ones.
    • If there are no offline directories, no action is taken either — the assignment refers to a log directory which was removed from configuration. The Controller will reassign the replica to UUID.Zero, so that a new log directory may be chosenthe broker selects a log directory and uses the new RPC — AssignReplicasToDirs — to notify the controller to create the metadata assignment to the actual log directory.

If the broker is configured with multiple log directories it remains FENCED until it can verify that all partitions are assigned to the correct log directories in the cluster metadata. This excludes the log directory that hosts the cluster metadata topic, if it is configured separately to a different path — using metadata.log.dir.

...

For any new partitions, the active controller will use Uuid.ZEROUnknownDir as the initial value for log directory UUID for each replica. Each broker with multiple log.dirs hosting replicas then assigns a log directory UUID and communicates it back to the active controller using the new RPC AssignReplicasToDirs so that cluster metadata can be updated with the log directory assignment.

...

When a controller receives a  BrokerHeartbeatBrokerHeartbeat request from a broker that indicates any UUIDs under the new OfflineLogDirs field, it will:

...

If the indicated log directory UUID is UUID.Zeronot one of the Broker's online log directories, then the replica is considered offline and the leader and ISR is updated accordingly, same as when the BrokerHeartbeat indicates a new offline log directory. This should only happen in the exceptional case that a Broker's metadata cache shows an incorrect assignment for some replica during the handling of a failure for the actual directory that hosts that replica.

...

  • If there are no indicated online log directory UUIDs the request is invalid and the controller replies with an error — INVALID_REQUEST.If the offline log directories flag is false and there are any missing log directories this means those have been removed from the broker’s configuration, so the controller will reassign all replicas currently assigned to the missing log directories to Uuid.ZERO to delegate the choice of log directory the broker, which will then report the choice via the AssignReplicasToDirs RPC.
  • If multiple log directories are registered the broker will remain fenced until the controller learns of all the partition to log directory placements in that broker - i.e. no remaining replicas assigned to Uuid.ZEROUnknownDir . The broker will indicate these using the AssignReplicasToDirs RPC.

    • The broker remains fenced by not wanting to unfence itself in heartbeat requests until the number of mismatching replica to log directory assignments is zero. This number is represented by the new metric NumMismatchingReplicaToLogDirAssignments.
  • If multiple log directories are registered and some of them are new (not present in previous registration) then these log directories are assumed to be empty. If they are not, the broker will use the AssignReplicasToDirs  RPC to correct assignment and choose not to become UNFENCED before the metadata is correct.

...

  • As per KIP-866, a separate Controller quorum is setup first, and only then the existing brokers are reconfigured and upgraded.
  • When configured for the migration and while still in ZK mode, brokers will:
    • update meta.properties to generate and include directory.id;
    • send BrokerRegistrationRequest including the log directory UUIDs;
    • notify the controller of log directory failures via BrokerHeartbeatRequest.
  • During the migration, the controller:
    • persists log directories indicated in broker registration requests in the cluster metadata;
    • relies on heartbeat requests to detect log directory failure instead of monitoring the ZK znode for notifications;
    • still uses full LeaderAndIsr requests to process log directory failures for any brokers still running in ZK mode.
  • The brokers restarting into KRaft mode will want to stay fenced until their log directory assignments for all hosted partitions are persisted in the cluster metadata.
  • The active controller will also ensure that any given broker stays fenced until it learns of all partition to log directory assignments in that specific broker via the new AssignReplicasToDirs RPC.
  • During the migration, replicas are assumed and assigned to log directory Uuid.ZEROUnknownDir until the actual log directory is learnt by the active controller from a broker running in KRaft mode.

...