Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Rejected because: in order to maintain the integrity of the config topic, it's imperative that only a single worker be able to access it at a time for a given connector. This could be accomplished by allowing each worker to write to the config topic with a transactional producer whose transactional ID is mapped in a 1:1 fashion from the name of the connector. However, if a rebalancing bug occurs and two non-zombie workers believe they both own the same Connector object, it's unclear how the cluster could gracefully recover from this, and it's likely that manual intervention by the user would be required.

Connector-defined transaction boundaries

Summary: allow connectors to dictate when a transaction should be started, committed, rolled back, etc.

Rejected because: out of scope; can be pursued as an additional feature later on.

Per-connector exactly-once property

...

Rejected because: there is no one-size-fits-all strategy for transaction boundaries that can be expected to accommodate every reasonable combination of connector and use case. Defining transactions on the batches returned from SourceTask::poll  would heavily limit throughput for connectors that frequently produce small record batches. Defining transactions on an interval would add a latency penalty to the records at the beginning of these transactions and, in the case of very large transactions, would inflate the memory requirements of downstream consumers (which would have to buffer the entire transaction locally within each topic-partition before beginning to process any of the records in it). And some connectors may have no reasonable way to define their own transaction boundaries at all.

Future Work

Per-connector granularity

We may want to enable exactly-once to be enabled on a per-connector basis. This could be implemented by instantiating each source task with a "fencible producer" that uses a transactional ID to write to Kafka, but does not actually produce records inside transactions. This way, if exactly-once becomes enabled for that connector, the first round of zombie fencing for it will be able to fence out all prior producer instances, even if they aren't using the traditional transactional producer.

Standalone mode support

Since the design for atomic writes of source records and their offsets relies on source offsets being stored in a Kafka topic, standalone mode is not eligible. If there is sufficient demand, we may add this capability to standalone mode in the future.

...