Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Require as few per-connector changes as possible. The Connect ecosystem is fairly mature at this point and even something as lightweight as a new SourceTask  method would have to be applied to potentially dozens if not hundreds of connectors. The current proposal requires no such changes for existing connectors, as long as they use they use the source offsets API correctly.
  • Minimize configuration changes. The implementation of this feature may grow fairly complex, but it should be easy for users to enable and understand. KIP-98, for example, added exactly-once support for Kafka and its Java clients, and only exposed three producer properties and one consumer property to users. The current proposal only adds two user-facing configuration properties.
  • Minimize operational burden on users. It's likely that this feature will be used heavily and may even be enabled by default with a later release of Connect. A wide range of Connect deployment styles and environments should be supported with as little pain as possible, including but not limited to security-conscious and resource-conscious environments. The current proposal's requirements in security-conscious environments are well-defined and easy-to-anticipate, and it does not significantly alter the resource allocation story of Connect.
  • Minimize gotchas and potential footguns. Nobody likes fine print. If we can address a common use case for Connect users or Connect developers, we should; leaving things out for the sake of a simpler design or smaller changes should only be done as a last resort. Even if well-documented, known gaps in use cases (especially those that vary from connector to connector) may not be propagated to or understood by end users, and may be seen as a fault in the quality of this feature.
  • Overall, make this a feature that gives people joy to use, not pain. Jury's out on this one until the vote thread passes!

Public Interfaces

New properties

A single worker-level configuration property will be added for distributed mode:

...

Name

Type

Default

Importance

Doc

offsets.storage.topic 

STRING 

null 

LOW 

The name of a separate offsets topic to use for this connector. If empty or not specified, the worker’s global offsets topic name will be used. If specified, the offsets topic will be created if it does not already exist on the Kafka cluster targeted by this connector (which may be different from the one used for the worker's global offsets topic if the bootstrap.servers property of the connector's producer has been overridden from the worker's).

New metrics

Three new task-level JMX properties will be added:

MBean namekafka.connect:type=source-task-metrics,connector=([-.\w]+),task=([\d]+)

Metric nameDescription
transaction-size-min The number of records in the smallest transaction the task has committed so far.
transaction-size-max The number of records in the largest transaction the task has committed so far.
transaction-size-avg The average number of records in the transactions the task has committed so far.

Proposed Changes

Atomic offset writes

...