Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

KIP-98 added support for exactly-once delivery guarantees with Kafka and its Java clients.

The idempotent producer

One common source of duplicate records from applications that write to Kafka was automatic producer retries, where a producer would re-send a batch to Kafka under certain circumstances even if that batch had already been committed by the broker. KIP-98 allowed users to configure their producer to perform these retries idempotently, such that downstream applications would only see one instance of any message in an automatically-resent producer batch, even if the batch were sent to Kafka several times by the producer.

The Transactional producer

Another A major component of the user-facing changes from that KIP was the introduction of the transactional producer, which can create transactions that span multiple topic partitions and perform atomic writes to them. Furthermore, the transactional producer is capable of "fencing" out other producers, which involves claiming ownership over a transactional ID and barring any prior producer instances with the same transactional ID from being able to make any subsequent writes to Kafka. This is especially useful in preventing zombie instances of a client application from sending duplicate messages.

...

Zombies: Some scenarios can cause multiple tasks to run and produce data for the same source partition (such as a database table or a Kafka topic partition) at the same time. This can also cause the framework to produce duplicate records.

Another common source of duplicate records from source connectors is automatic producer retries. However, users can already address this by configuring source connectors to use an idempotent producer by the worker-level producer.enable.idempotence  or connector-level producer.override.enable.idempotence  properties.

In order to support exactly-once delivery guarantees for source connectors, the framework should be expanded to atomically write source records and their source offsets to Kafka, and to prevent zombie tasks from producing data to Kafka.

...