...
A AssignReplicasToDirs
request including an assignment to Uuid.LOST_DIR
conveys that the Broker is wanting to correct a replica assignment into a offline log directory, which cannot be identified.
This request is authorized with CLUSTER_ACTION on CLUSTER.
Proposed changes
Metrics
MBean name | Description |
---|---|
kafka.server:type=KafkaServer,name=QueuedReplicaToDirAssignments | The number of replicas hosted by the broker that are either missing a log directory assignment in the cluster metadata or are currently found in a different log directory and are queued to be sent to the controller in a |
...
When multiple log directories are configured, and some (but not all) of them become offline, the broker will communicate this change using the new field OfflineLogDirs
in the BrokerHeartbeat
request — — indicating the UUIDs of the new offline log directories. The UUIDs for the newly accumulated failed log directories are included in the every BrokerHeartbeat
request until the broker receives a successful responserestarts. If the Broker is configured with a single log directory, this field isn't used, as the current behavior of the broker is to shutdown when no log directories are online.
Log directory failure notifications are queued and batched together in the next broker heartbeat request. If there are any queued partition-to-directory assignments — sent in AssignReplicasToDirs
— to send to the controller, those that are respective to any of the newly failed log directories (i.e. assignments that are either into or out-of these directories) are prioritized and sent first. The broker retries these until it receives a successful reply, which conveys that the metadata change has been successfully persisted. This ensures that the Controller is in sync with regards to partition-to-directory assignments and can reliably determine which partitions need leadership and ISR updatefailure notifications are queued and batched together in all future broker heartbeat requests.
If the Broker repeatedly fails to communicate a log directory failure, or a replica assignment into a failed directory, after a configurable amount of time — log.dir.failure.timeout.ms
— and it is the leader for any replicas in the failed log directory the broker will shutdown, as that is the only other way to guarantee that the controller will elect a new leader for those partitions.
...
The diagram below illustrates the sequence of steps involved in moving a replica between log directories.
In the diagram above, notice that if dir1
fails after the AssignReplicasToDirs
RPC is sent, but before the future replica is promoted, then the controller will not know to update leadership and ISR for the partition. If the destination directory has failed, it won't be possible to promote the future replica, and the Broker needs to revert the assignment (cancelled locally if still queued). If the source directory has failed, then the future replica might not catch up, and the Controller might not update leadership and ISR for the partition. In this exceptional case, the broker issues a AssignReplicasToDirs
RPC to the Controller to assignment the replica to UUID.LOST_DIR
- this lets the Controller know that it needs to update leadership and ISR for this partition too.
...
- As per KIP-866, a separate Controller quorum is setup first, and only then the existing brokers are reconfigured and upgraded.
- When configured for the migration and while still in ZK mode, brokers will:
- update meta.properties to generate and include
directory.id
;
- send
BrokerRegistrationRequest
including the log directory UUIDs; - shutdown if any directory fails;
- sends assignments via the
AssignReplicasToDirs
RPCnotify the controller of log directory failures via BrokerHeartbeatRequest.
- update meta.properties to generate and include
- During the migration, the controller:
- persists log directories indicated in broker registration requests in the cluster metadata;
- relies on heartbeat requests to detect log directory failure instead of monitoring the ZK znode for notifications;
- still uses full
LeaderAndIsr
requests to process log directory failures for any brokers still running in ZK modepersists directory assignments received via theAssignReplicasToDirs
RPC.
- The brokers restarting into KRaft mode will want to stay fenced until their log directory assignments for all hosted partitions are persisted in the cluster metadata.
- The active controller will also ensure that any given broker stays fenced until it learns of all partition to log directory assignments in that specific broker via the new
AssignReplicasToDirs
RPC. - During the migration, existing replicas are assumed and assigned to log directory
Uuid.MIGRATING_DIR
until the actual log directory is learnt by the active controller from a broker running in KRaft mode.
...