...
The following UUIDs are excluded from the random pool when generating a log directory UUID:
UUID.UnknownDirUNASSIGNED_DIR
–new Uuid(0L, 0L)
– used to identify new or unknown assignments.UUID.OfflineDirLOST_DIR
-new Uuid(0L, 1L)
– used to represent unspecified offline directories.UUID.MIGRATING_DIR
-new Uuid(0L, 2L)
– used when transitioning from a previous state where directory assignment was not available, to designate that some directory was previously selected to host a partition, but we're not sure which one yet.
The first 100 UUIDs, minus the three listed above are also reserved for future use.
Metadata records
RegisterBrokerRecord
and Brok
erRegistrationChangeRecord
will have a new field:
...
Although not explicitly specified in the schema, the default value for Directory
is Uuid.UnknownDir UNASSIGNED_DIR (Uuid.ZERO)
, as that's the default default value for UUID types.
...
A directory assignment to Uuid.UnknownDirUNASSIGNED_DIR
conveys that the log directory is not yet known, the hosting Broker will eventually determine the hosting log directory and use AssignReplicasToDirs
to update this the assignment.
...
A AssignReplicasToDirs
request including an assignment to Uuid.OfflineDirLOST_DIR
conveys that the Broker is wanting to correct a replica assignment into a offline log directory, which cannot be identified.
...
- the log directory UUID is
UUID.OfflineDirLOST_DIR
- the hosting broker's registration indicates multiple online log directories. i.e.
brokerRegistration.LogDirs.length > 1
...
- If the partition is not assigned to a log directory (refers to
Uuid.UnknownDirUNASSIGNED_DIR)
- If the partition already exists, the broker uses the new RPC —
AssignReplicasToDirs
— to notify the controller to change the metadata assignment to the actual log directory. - If the partition does not exist, the broker selects a log directory and uses the new RPC —
AssignReplicasToDirs
— to notify the controller to create the metadata assignment to the actual log directory.
- If the partition already exists, the broker uses the new RPC —
- If the partition is assigned to an online log directory
- If the partition does not exist it is created in the indicated log directory.
- If the partition already exists in the indicated log directory and no future replica exists, then no action is taken.
- If the partition already exists in the indicated log directory, and there is a future replica in another log directory, then the broker starts the process to replicate the current replica to the future replica.
- If the partition already exists in another online log directory and is a future replica in the log directory indicated by the metadata, the broker will replace the current replica with the future replica after making sure that the future replica is fully caught up with the current replica.
- If the partition already exists in another online log directory, the broker uses the new RPC —
AssignReplicasToDirs
— to the controller to change the metadata assignment to the actual log directory. The partition might have been moved to a different log directory whilst the broker was offline.
- If the partition is assigned to an unknown log directory or refers to
Uuid.OfflineDirLOST_DIR
- If there are offline log directories, no action is taken — the assignment refers to a a log directory which may be offline, we don't want to fill the remaining online log directories with replicas that existed in the offline ones.
- If there are no offline directories, the broker selects a log directory and uses the new RPC —
AssignReplicasToDirs
— to notify the controller to create the metadata assignment to the actual log directory.
If instead, a single entry is configured under log.dirs
or log.dir
, then the AssignReplicasToDirs
RPC is only sent to correct assignments to UUID.OfflineDirLOST_DIR
, as described above.
If the broker is configured with multiple log directories it remains FENCED until it can verify that all partitions are assigned to the correct log directories in the cluster metadata. This excludes the log directory that hosts the cluster metadata topic, if it is configured separately to a different path — using metadata.log.dir
.
...
In the diagram above, notice that if dir1
fails after the AssignReplicasToDirs
RPC is sent, but before the future replica is promoted, then the controller will not know to update leadership and ISR for the partition. If the destination directory has failed, it won't be possible to promote the future replica, and the Broker needs to revert the assignment (cancelled locally if still queued). If the source directory has failed, then the future replica might not catch up, and the Controller might not update leadership and ISR for the partition. In this exceptional case, the broker issues a AssignReplicasToDirs
RPC to the Controller to assignment the replica to UUID.OfflineDirLOST_DIR
- this lets the Controller know that it needs to update leadership and ISR for this partition too.
...
For any new partitions, the active controller will use Uuid.UnknownDirUNASSIGNED_DIR
as the initial value for log directory UUID for each replica – this is the default (empty) value for the tagged field. Each broker with multiple log.dirs
hosting replicas then assigns a log directory UUID and communicates it back to the active controller using the new RPC AssignReplicasToDirs
so that cluster metadata can be updated with the log directory assignment. Brokers that are configured with a single log directory to not send this RPC.
...
- If there are no indicated online log directory UUIDs the request is invalid and the controller replies with an error 42 –
INVALID_REQUEST
. If multiple log directories are registered the broker will remain fenced until the controller learns of all the partition to log directory placements in that broker - i.e. no remaining replicas assigned to
Uuid.UnknownDirUNASSIGNED_DIR
. The broker will indicate these using the AssignReplicasToDirs RPC.- The broker remains fenced by not wanting to unfence itself in heartbeat requests until the number of mismatching replica to log directory assignments is zero. This number is represented by the new metric
QueuedReplicaToDirAssignments
.
- The broker remains fenced by not wanting to unfence itself in heartbeat requests until the number of mismatching replica to log directory assignments is zero. This number is represented by the new metric
- If multiple log directories are registered and some of them are new (not present in previous registration) then these log directories are assumed to be empty. If they are not, the broker will use the
AssignReplicasToDirs
RPC to correct assignment and choose not to become UNFENCED before the metadata is correct.
...
- As per KIP-866, a separate Controller quorum is setup first, and only then the existing brokers are reconfigured and upgraded.
- When configured for the migration and while still in ZK mode, brokers will:
- update meta.properties to generate and include
directory.id
;
- send
BrokerRegistrationRequest
including the log directory UUIDs; - notify the controller of log directory failures via
BrokerHeartbeatRequest.
- update meta.properties to generate and include
- During the migration, the controller:
- persists log directories indicated in broker registration requests in the cluster metadata;
- relies on heartbeat requests to detect log directory failure instead of monitoring the ZK znode for notifications;
- still uses full
LeaderAndIsr
requests to process log directory failures for any brokers still running in ZK mode.
- The brokers restarting into KRaft mode will want to stay fenced until their log directory assignments for all hosted partitions are persisted in the cluster metadata.
- The active controller will also ensure that any given broker stays fenced until it learns of all partition to log directory assignments in that specific broker via the new
AssignReplicasToDirs
RPC. - During the migration, existing replicas are assumed and assigned to log directory
Uuid.UnknownDir
untilMIGRATING_DIR
until the actual log directory is learnt by the active controller from a broker running in KRaft mode.
...