Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Small wording improvements

...

Support for multiple log directories per broker, aka JBOD (Just a Bunch Of Disks) came in KIP-112 and since then JBOD has been an important feature in Kafka, allowing it to run on large deployments with multiple local disksstorage devices per broker.


To ensure availability, when a partition leader fails, the controller should elect a new leader from one of the other in-sync replicas. But the controller does not check whether each leader is correctly performing its duties, instead the controller simply assumes that each broker is working correctly if it is still an active member of the cluster. In KRaft, cluster membership is based on timely heartbeat requests sent by each broker to the active controller. In ZooKeeper, cluster membership is based on an ephemeral zNode under /brokers/ids.

In KRaft mode, when a single log directory fails, the broker will be unable to be either a leader or a follower for any partitions in that log directory, but the controller will have no signal that it needs to update leadership and ISR for the replicas in that log directory, as the broker will continue to send heartbeat requestrequests.

In ZooKeeper mode when a log directory fails, the broker sends a notification to the controller which then sends a full LeaderAndIsr request  request to the broker, listing all the partitions for all log directories for that broker. The controller relies on per-partition error results from the broker to update leadership and ISR for the replicas in the failed log directory. Without this notification, the partitions with leadership on that log directory will not get a new leader assigned and would remain unavailable.

Support for KRaft in JBOD, was proposed and accepted back in KIP-589 — with a new RPC from the broker to the controller indicating the affected topic partitions in a failed log directory — but the implementation was never merged and concerns were raised with possible large requests from the broker to the controller.

KIP-833 was accepted, with plans to mark KRaft as production ready and deprecate ZooKeeper mode, but JBOD is still a missing feature in KRaft. This KIP aims to provide support for JBOD in KRaft, while avoiding any RPC having to list all the partitions in a log directory.

...

If some or all of the log directories are new, then the format  command should be used. Otherwise update-directories  can be used to update the two properties: directory.id  and directory.ids . This can ensure that each log directory configured in log.dirs  has a unique UUID assigned under the property log.directory  in its meta.properties  and that the full set of UUIDs for all configured log.dirs   is persisted in directory.ids in all meta.properties .

meta.properties

The meta.properties  version field will be bumped from 1 to 2. Two new properties directory.id and directory.ids will be added to the meta.properties file in each log directory, including the metadata.log.dir . The first property, directory.id indicates the UUID for the log directory where the file is located, the second property, directory.ids  lists all the UUIDs for all the configured log directories. If the meta.properties  file doesn't exist for the metadata.log.dir  the Kafka node will fail to start. If the meta.properties  file exists but it doesn't contain these two properties a new one will be generated and the meta.properties  files will be updated. The kafka-storage.sh  tool will be extended to generate and update the two properties as described in the previous section.

...

{ "name": "Replicas", "type":  "[]int32", "versions":  "0+", "entityType": "brokerId",
"about": "The replicas of this partition, sorted by preferred order." },
(...)
{ "name": "Assignment", "type": "[]ReplicaAssignment", "versions": "1+",
"about": "The replicas of this partition, sorted by preferred order.", "fields": [
{ "name": "Broker", "type": "int32", "versions": "1+", "entityType": "brokerId",
"about": "The broker ID hosting the replica." },
{ "name": "Directory", "type": "uuid", "versions": "1+",
"about": "The log directory hosting the replica" }
]}

...

Having a persisted UUID at the root of each log directory allows the broker to identify the log directory regardless of the mount path.
Having a persisted list of all UUIDs for all configured log directories allows the broker to determine the UUIDs of unavailable (offline) log directories, as the meta.properties  files for the offline log directories are likely to be unavailable.

...

When the broker starts up and initializes the LogManager, it will load the UUID for each log directory (directory.id ) and the list of all log directory UUIDs (directory.ids), by reading the meta.properties file  file at the root of each log directory.

...