Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

JBOD is coming to KRaft in KIP-858: Handle JBOD broker disk failure in KRaft! It has been a feature in Kafka since KIP-112: Handle disk failure for JBOD. The evolution of how disk failures are handled in KIP-858 is more about the notification and persistence mechanisms rather than the sequence of steps undertaken by the controller and broker to prevent future interactions with the affected disk. What we propose in this KIP is to treat a subset of a disk failure - a disk becoming full - in such a way to allow space to be cleared up using Kafka functionality such as modifying retention periods or deleting problematic topics. The examples in this KIP feature use a Zookeeper-backed Kafka cluster, but we believe the functionality will be easily implementable once KIP-858 is code-complete.

...

Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.

A public interface is any change to the following:

...

Binary log format

...

The network protocol and api behavior

...

Any class in the public packages under clientsConfiguration, especially client configuration

  • org/apache/kafka/common/serialization

  • org/apache/kafka/common

  • org/apache/kafka/common/errors

  • org/apache/kafka/clients/producer

  • org/apache/kafka/clients/consumer (eventually, once stable)

...

Monitoring

...

Command line tools and arguments

...

No foreseen changes to public-facing interfaces.

Proposed Changes

Current state

...

We will add a new state to the broker state machines of a log directory (saturated) and a partition replica (saturated). The partition state machine is only known to the broker and it won’t be replicated on the controller. We need these additional states in order to restrict which background tasks operate on them. If we do not have a separate state then we have no way to tell Kafka that we would like deletion and retention to continue working on saturated log directories and partitions.

Notification mechanism - Zookeeper/KRaft Controller
No changes will be introduced to Zookeeper. We continue to use it only as a notification mechanism.

Controller
Instead of sending a delete topic request only to replicas we know to be online, we will allow a delete topic request to be sent to all replicas regardless of their state. Previously a controller did not send delete topic requests to brokers because it knew they would fail. In the future, topic deletions for saturated topics will succeed, but topic deletions for the offline scenario will continue to fail.

...

Describe in few sentences how the KIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

We will implement new integration and system tests which artificially constraint the space available to the Kafka log directories in order to gain confidence in the behaviour of the system.

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

...