Table of Contents

Status

Current state: Under DiscussionDiscarded. (KIP-382 supersedes this with a much more complete vision)

Discussion thread: here [TBD]

JIRA: here KAFKA-6963

Motivation

Copying messages between Kafka clusters is common enough that Kafka has had its own standalone MirrorMaker tool early on in the project. Kafka Connect was introduced to provide a standard framework for moving data into and out of Kafka, and it has been quickly adopted by the community with a large number of connectors available for other platforms. One such connector that does not exist is a Kafka connector. While MirrorMaker does satisfy the basic functional requirement copying messages between clusters, its standalone nature means that there is additional work required in maintaining the MirrorMaker cluster, especially in environments where Kafka Connect will also be used.

...

At least once delivery to the destination cluster must be supported
Can specify a topic whitelist as a regular expression (Same as in MirrorMaker)

Basic Implementation Components

...

Note that offset checkpointing is managed by Kafka Connect. In order to ensure that As it is common to monitor consumer lag using consumer groups, the connector will also commit the last batch of offsets to the source Kafka cluster before polling the consumer for new data, unless this is turned off via configuration (consumer.enable.auto.commit = false).

Configuration Options

...

Standard Options

These are the most common options that are required when configuring this connector:

topicsComma separated list of topics, or java regular expressions to match topics to mirror

Configuration Parameter	Example	Description
source.bootstrap.servers	source.broker1:9092,source.broker2:9092	Mandatory. Comma separated list of boostrap servers for the source Kafka cluster
source.topic.whitelist	topic, topic-prefix*	Java regular expression to match topics to mirror. For convenience, comma (',') is interpreted as the regex-choice symbol ('\|').
source.auto.offset.reset	latest	If there is no stored offset for a partition, indicates where to start consuming from. Options are "earliest"or "latest". Default: earliest
source.group.	an.interesting.topic,all.of.these.*	id	kafka-connect	Group ID used when writing offsets back to source cluster (for offset lag tracking)
destination.topics.prefix	aggregate.	Prefix to add to source topic names when determining the Kafka topic to publish data to

...

Some use cases may require modifying the following default connector options. Use with care.

1500Indicates whether the partition monitor should request a task reconfiguration when partition leaders have changed. In some cases this may be a minor optimization as when generating task configurations, the connector will try to group partitions to be consumed by each task by the leader node. The downside to this is that it may result in additional rebalances.

Configuration Parameter	Default	Description
include.message.headers	true	Indicates whether message headers from source records should be included when delivered to the destination cluster.
topic.list.timeout.ms	60000	Amount of time (in milliseconds) the partition monitor thread should wait for the source kafka cluster to return topic information before logging a timeout error.
topic.list.poll.interval.ms	300000	Amount of time (in milliseconds) the partition monitor will wait before re-querying the source cluster for a change in the topic partitions to be consumed
reconfigure.tasks.on.leader.change	false	Indicates whether the partition monitor should request a task reconfiguration when partition leaders have changed. In some cases this may be a minor optimization as when generating task configurations, the connector will try to group partitions to be consumed by each task by the leader node. The downside to this is that it may result in additional rebalances.
poll.loop.timeout.ms	1000	Maximum amount of time (in milliseconds) the connector will wait in each poll loop without data before returning control to the kafka connect task thread.
max.shutdown.wait.ms	2000	Maximum amount of time (in milliseconds) to wait for the connector to gracefully shut down before forcing the consumer and admin clients to close. Note that any values greater than the kafka connect parameter task.shutdown.graceful.timeout.ms will not have any effect.
source.max.poll.records	500	Maximum number of records to return from each poll of the internal KafkaConsumer. When dealing with topics with very large messages, the connector may sometimes spend too long processing each batch of records, causing lag in offset commits, or in serious cases, unnecessary consumer rebalances. Reducing this value can help in these scenarios. Conversely, when processing very small messages, increasing this value may improve overall throughput.
partition.monitor.topic.list.timeout.ms	60000	Amount of time (in milliseconds) the partition monitor thread should wait for the source kafka cluster to return topic information before logging a timeout error.
partition.monitor.poll.interval.ms	300000	Amount of time (in milliseconds) the partition monitor will wait before re-querying the source cluster for a change in the partitions to be consumed
source.key.deserializer	org.apache.kafka.common.serialization.ByteArrayDeserializer	Key deserializer to use for the kafka consumers connecting to the source cluster.
source.value.deserializer	org.apache.kafka.common.serialization.ByteArrayDeserializer	Value deserializer to use for the kafka consumers connecting to the source cluster.
source.enable.auto.commit	true	If true the consumer's offset will be periodically committed to the source cluster in the background. Note that these offsets are not used to resume the connector (They are stored in the Kafka Connect offset store), but may be useful in monitoring the current offset lag of this connector on the source cluster	partition.monitor.reconfigure.tasks.on.leader.change	false

Overriding the internal KafkaConsumer and AdminClient Configuration

Note that standard Kafka parameters can be passed to the internal KafkaConsumer and AdminClient by prefixing the standard consumer configuration parameters with "source.".

For cases where the configuration for the KafkaConsumer and AdminClient diverges, you can use the more explicit "sourceconnector.consumer." and "sourceconnector.admin." configuration parameters parameter prefixes to fine tune the settings used for each. This is especially useful in cases where you may need to pass authentication parameters to these clients.

Compatibility, Deprecation, and Migration Plan

...

Space shortcuts

Child pages

Versions Compared

Old Version 5

New Version Current

Key

Status

Motivation

Basic Implementation Components

Configuration Options

Standard Options

Overriding the internal KafkaConsumer and AdminClient Configuration

Compatibility, Deprecation, and Migration Plan

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 5

New Version Current

Key

Status

Motivation

Basic Implementation Components

Configuration Options

Standard Options

Overriding the internal KafkaConsumer and AdminClient Configuration

Compatibility, Deprecation, and Migration Plan