Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Handling AlterReplicaLogDirs

...

  • If the partition is not assigned to a log directory (refers to Uuid.ZERO)
    • If the partition already exists, the broker uses the new RPC — AssignReplicasToDirectories — to notify the controller to change the metadata assignment to the actual log directory.
    • If the partition is new, the broker selects a log directory and uses the new RPC — AssignReplicasToDirectories — to notify the controller to create the metadata assignment to the actual log directory.
  • If the partition is assigned to an online log directory
    • If the partition is new, or if there are no offline log directories, it is created in the indicated log directory.
    • If the partition already exists in the indicated log directory no action is taken.
    • If the partition already exists in another log directory and is a future replica in the log directory indicated by the metadata, the broker will replace the current replica with the future replica.
    • If the partition already exists in another log directory, the broker uses the new RPC — AssignReplicasToDirectories — to the controller to change the metadata assignment to the actual log directory. The partition might have been moved to a different log directory. whilst the broker was offline. 
    • If the partition is
    assigned to an
    • not new and does not exist, and there are any offline log
    directory
    • directories,
    no action is taken — the controller is already aware of this.
  • If the partition is assigned to an unknown log directory, no action is taken — the controller is already aware of this and will reassign the replica to one of the online log directories in a future metadata update. 

When replicas are moved between directories, using the existing AlterReplicaLogDirs RPC, the receiving broker will forward the RPC to the controller, converting log directory paths into log directory UUIDs. The controller will then perform the reassignment of the replicas and commit new metadata records, which the broker will eventually catch up to. When the broker sees the metadata update with 

    • the broker uses AssignReplicasToDirectories to change the metadata assignment to an offline log directory. This is in line with current behavior, if any log directories are offline non-new replicas are not created. The assignment to an offline log directory signals to the controller that the replica is actually offline, and this prevents a broker without data – due to a failed disk – continuing as the leader in case of any synchronisation failure of the replica to log directory assignment between the broker and the metadata.
  • If the partition is assigned to an offline log directory, no action is taken — the controller is already aware of this.
  • If the partition is assigned to an unknown log directory, no action is taken — the controller is already aware of this and will reassign the replica to one of the online log directories in a future metadata update. 

If the broker is configured with multiple log directories it remains FENCED until it can verify that all partitions are assigned to the correct log directories in the cluster metadata. This excludes the log directory that hosts the cluster metadata topic, If the broker is configured with multiple log directories it remains FENCED until it can verify that all partitions are assigned to the correct log directories in the cluster metadata. This excludes the log directory that hosts the cluster metadata topic, if it is configured separately to a different path — using metadata.log.dir.

Metadata caching

Replicas are considered offline if the replica references a log directory which is not in the list of online log directories for the broker ID hosting the replica.

When replicas are moved between directories, using the existing AlterReplicaLogDirs RPC, the receiving broker will start moving the replicas using AlterReplicaLogDirs threads as usual. When a future replica first catches out the broker will asynchronously communicate the log directory change to the controller using the new RPC – AssignReplicasToDirectories  – but keep the AlterReplicaLogDirs thread going. Once the broker receives confirmation of the metadata change then it briefly blocks appends to the old replica, makes sure the future log fully caches up and makes the switch. By delaying the metadata change until the future replica has caught up we minimize the chance of a log directory failure happening with an incorrect replica to log directory assignment in the metadata.

Metadata caching

Replicas are considered offline if the replica references a log directory which is not in the list of online log directories for the broker ID hosting the replica.

Handling log directory failures

When one or more log directories becomes offline, the broker will communicate this change using the new field LogDirsOfflined  in the BrokerHeartbeat  request — indicating the UUIDs of the new offline log directories. The UUIDs for the newly failed log directories are included in the BrokerHeartbeat  request until the broker receives a successful response.more log directories becomes offline, the broker will communicate this change using the new field LogDirsOfflined  in the BrokerHeartbeat  request — indicating the UUIDs of the new offline log directories. The UUIDs for the newly failed log directories are included in the BrokerHeartbeat  request until the broker receives a successful response.

If a log directory fails and the active controller learns of the failed log directory at a time when the replica-to-log-directory assignment in the metadata is incorrect then the controller might not know to update the leadership and ISR for some replicas, if some of those replicas are leaders then those partitions will become unavailable. As a fallback safety mechanism in case of any synchronisation issues, when a log directory fails, the broker will check log directory assignments in the metadata and if there are any missing replicas, the broker will use AlterReplicaLogDirs assigning the replicas to the offline log directory, to convey to the controller that these replicas are offline and need leadership and ISR updates. 

Controller

Replica placement

...