You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Kafka currently provides at-least-once messaging guarantees. Duplicates can arise due to either producer retries or consumer restarts after failure. One way to provide exactly-once messaging semantics is to implement an idempotent producer. This has been covered at length in the proposal for an Idempotent Producer. An alternative and more general approach is to support transactional messaging. This can enable use-cases such as replicated logging for transactional data services in addition to the classic idempotent producer use-cases.

What is transactional messaging?

Producers can explicitly initiate transactional sessions, send (transactional) messages within those sessions and either commit or abort the transaction. The guarantees that we aim to achieve for transactions are perhaps best understood by enumerating the functional requirements.

  1. A consumer's application should not be exposed to messages from uncommitted transactions.
  2. The broker cannot lose any committed transactions.
  3. There should be no duplicate messages within transactions.
  4. Transaction ordering within a partition: A transaction-aware consumer should see transactions in the original transaction-order within each partition.

Furthermore, we have the following implementation requirements:

  1. The implementation should be scalable. E.g., a dedicated log per transaction is unacceptable.
  2. Performance:
    1. The throughput of a transactional producer should be comparable to that of a non-transactional producer.
    2. Acceptable latency. E.g., avoid copying the transactional data as much as possible.
    3. Any implementation should not make the partition unavailable (say, due to locking) for an unreasonable period of time.
  3. Client simplicity: Favor a scheme that lend to a simpler client-side implementation (even if it adds more complexity to the broker). For example, it is acceptable (but not ideal) for a consumer implementation to (internally) buffer and subsequently discard messages from uncommitted transactions. i.e., if the chosen implementation allows the broker to materialize messages from uncommitted transactions in the data logs.
  4. Interleaving: Each partition should be able to accept messages from both transactional and non-transactional producers.
  • No labels