Status
Current state: Draft
Discussion thread:
JIRA
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
MirrorMaker2 is currently implemented on Kafka Connect Framework, more specifically the Source Connector / Task, which do not provide exactly-once semantics (EOS) out-of-the-box, as discussed in https://github.com/confluentinc/kafka-connect-jdbc/issues/461, https://github.com/apache/kafka/pull/5553, and .Therefore MirrorMaker2 currently does not provide EOS.
This proposal is to provide an option to enable EOS if no data loss or duplicate data is preferred. The key idea behind this proposal is to extend SinkTask with a brand new MirrorSinkTask implementation, which provides an option to
(1) create a transactional producer and commit consumer offset within an transaction by the producer
OR
(2) create a non-transactional producer and consumer offsets are committed by the consumer itself separately
The reasons to extend SinkTask:
- as mentioned above, Kafka Connect Source Connector / Task do not provide EOS by nature
- MirrorSinkTask (that extends SinkTask) can use a config to use transactional producer or non-transactional producer to produce to the downstream kafka cluster
- MirrorSinkTask can return an empty result from preCommit()(defined in SinkTask) by overriding it to disable the consumer offset commit in WorkerSinkTask, so that the transactional producer is able to commit consumer offset within one transaction.
Public Interfaces
New classes and interfaces include:
MirrorSinkConnector, MirrorSinkTask classes
Name | Type | Default | Doc |
---|---|---|---|
transaction.producer | boolean | false | if True, EOS is enabled between consumer and producer |
Proposed Changes
MirrorSinkTask
Since MirrorSinkTask is proposed to to extend SinkTask,