Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Please help us keep this FAQ up-to-date. If there is an answer that you think can be improved, please help improve it. If you look for an answer that isn't here, and later figure it out, please add it. You don't need permission, it's a wiki. (smile)

Table of Contents

...

Exactly-Once Processing

What is the difference between an "idempotent producer" and a "transactional producer"?

An idempotent producer guarantees that single messages don't end up as duplicates in case a write is internally retried by the producer. A transactional producer allows you to write multiple messages into different partitions across multiple topics atomically. Note: if you use transactions, you automatically get idempotent writes, too.

...

You need to provide a cluster wide unique `transactional.id` for the producer and use the corresponding transaction producer calls (iei.e., initTransaction(), beginTransaction(), commitTransaction(), etc.)

What are PIDs and sequence numbers and how

...

are they related to `transactional.id`?

If a producer is configured for idempotent writes, it gets a cluster wide unique PID (producer id) assigned. The producer also appends a sequence number to every message it writes (starting with sequence number zero). Different producers would use the same sequence numbers. However, the PID-sequenceNumber-pair will be globally unique and allows brokers to identify duplicates duplicate writes (and filter/drop them). If an idempotent producer is stopped and restarted, it gets a new PID assigned, iei.e., PIDs don't "survive".

A `transactional.id` is a user config and thus on producer restart, the same `transactional.id` is uses. This allows brokers to identify the same producer across producer restarts. This identification is required to guarantee consistency in case of a failure: if a producer has an open transactions transaction and crashed, on producer restart the brokers can detect the open transaction and abort it automatically.

...

You only need to configure a consumer with `isolation.level="read_committed"` if the topic contains transactional data, iei.e., was written by a transactional producer. If data is written with an idempotent producer, no transactions are used, and thus using "read_uncommitted" or "read_committed" for the consumer does not make any difference.

...