Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: dir failure handler verifies metadata

...

  • If the partition is not assigned to a log directory (refers to Uuid.ZERO)
    • If the partition already exists, the broker uses the new RPC — AssignReplicasToDirs — to notify the controller to change the metadata assignment to the actual log directory.
    • If the partition does not exist, the broker selects a log directory and uses the new RPC — AssignReplicasToDirs — to notify the controller to create the metadata assignment to the actual log directory.
  • If the partition is assigned to an online log directory
    • If the partition does not exist it is created in the indicated log directory.
    • If the partition already exists in the indicated log directory and no future replica exists, then no action is taken.
    • If the partition already exists in the indicated log directory, and there is a future replica in another log directory, then the broker starts the process to replicate the current replica to the future replica.
    • If the partition already exists in another online log directory and is a future replica in the log directory indicated by the metadata, the broker will replace the current replica with the future replica after making sure that the future replica is fully caught up with the current replica.
    • If the partition already exists in another online log directory, the broker uses the new RPC — AssignReplicasToDirs — to the controller to change the metadata assignment to the actual log directory. The partition might have been moved to a different log directory whilst the broker was offline. 
  • If the broker knows that the partition already exists in a different log directory that is now offline, then the controller might not have known to update leadership and ISR when the log directory failure was communicated to the controller – unlikely, but possibly due to some synchronisation failure in the replica to log directory assignment between the broker and the metadata. To prevent the partition from remaining offline, the broker uses AssignReplicasToDirs to change the metadata assignment to an offline log directory. The controller learns the replica is offline and updates leadership and ISR.If the partition is assigned to an offline log directory, no action is taken — the controller is already aware of this, and we don't want to fill the remaining online log directories with replicas that existed in the offline ones.
  • If the partition is assigned to an unknown log directory, no action is taken — the controller is already aware of this and will reassign the replica to one of the online log directories in a future metadata update. 

...

When one or more log directories become offline, the broker will communicate this change using the new field LogDirsOfflined  in the BrokerHeartbeat  request — indicating the UUIDs of the new offline log directories. The UUIDs for the newly failed log directories are included in the BrokerHeartbeat  request until the broker receives a successful response.

If a log directory fails and the active controller learns of the failed log directory at a time when the replica-to-log-directory assignment in the metadata is incorrect then the Because the broker is proactive in communicating any log directory assignment changes to the controller, the metadata should be up to date and correct when the controller is notified of a failed log directory. However, the consequences of some partition assignment being incorrect – due to some error or race condition - can be quite damaging, as the controller might not know to update the leadership and ISR for some replicas, if some of those replicas are leaders then those partitions will become unavailable. As a fallback safety mechanism in case of any synchronisation issues, when a log directory fails, the broker will check log directory assignments in the metadata and if there are any missing replicasfor that partition, leaving it unavailable for an indefinite amount of time. So, as a fallback mechanism, when handling a runtime directory failure, the broker must verify the assignments for the newly failed partitions against the latest metadata, and for any incorrect assignments, the broker will use AlterReplicaLogDirs assigning the replicas to the offline log directory, to convey to the controller that these replicas are offline and need   to rectify them so that the controller can update leadership and ISR updates. 

Controller

Replica placement

...