Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We already have pretty stable and low P99 latency, it will definitely make Kafka more suitable for more use cases if we can have the similar guarantees for P999.

Public Interfaces

Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.

A public interface is any change to the following:

...

Binary log format

...

The network protocol and api behavior

...

Any class in the public packages under clientsConfiguration, especially client configuration

  • org/apache/kafka/common/serialization

  • org/apache/kafka/common

  • org/apache/kafka/common/errors

  • org/apache/kafka/clients/producer

  • org/apache/kafka/clients/consumer (eventually, once stable)

...

Monitoring

...

Command line tools and arguments

...

There will be a new config inside clients/producer called "quorum.required.acks".

Proposed Changes

I totally understand in the early days, we have the support to specify the "request.required.acks" or "acks=2" on the producer side. We got rid of these supports since it was misleading and can't guarantee there is no data loss in all of the scenarios. The proposed changes here contain two separate parts. 1: introduce a similar config on the producer side as the "request.required.acks"; however, with more strict requirements. 2: improve the new leader election process to achieve the same level of data durability.

...

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be No impact on existing users? .
  • No need to If we are changing behavior how will we phase out the older behavior? .
  • No If we need special migration tools , describe them hererequired.
  • When will we remove the existing behavior?No need to remove any existing behaviors.

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other wayI got some recommendations from the offline discussions. One of them is to set replicas=2 and ack=-1, this will only wait 1 follower fetch the pending message; however, from the experiment, the P999 is still very spiky.