Table of Contents |
---|
Status
Current state: Under DiscussionVoting
Discussion thread: here
JIRA: Jira server ASF JIRA columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key KAFKA-7500
...
Leverages the Kafka Connect framework and ecosystem.
Includes both source and sink connectors.
- Includes a high-level driver that manages connectors in a dedicated cluster.
- High-level REST API abstracts over connectors between multiple Kafka clusters.
- Detects new topics, partitions.
Automatically syncs topic configuration between clusters.
Manages downstream topic ACL.
Supports "active/active" cluster pairs, as well as any number of active clusters.
Supports cross-datacenter replication, aggregation, and other complex topologies.
Provides new metrics including end-to-end replication latency across multiple data centers/clusters.
Emits offsets required to migrate consumers between clusters.
Tooling for offset translation.
MirrorMaker-compatible legacy mode.
...
MirrorSourceConnector, MirrorSinkConnector, MirrorSourceTask, MirrorSinkTask classes.
- MirrorCheckpointConnector, MirrorCheckpointTask.
- MirrorHeartbeatConnector, MirrorHeartbeatTask.
- MirrorConnectorConfig, MirrorTaskConfig classes.
ReplicationPolicy interface. DefaultReplicationPolicy and LegacyReplicationPolicy classes.
Heartbeat and checkpoint topics and associated schemas.
RemoteClusterUtils class for querying remote cluster reachability and lag, and for translating consumer offsets between clusters.
MirrorMaker driver class with main entry point for running MM2 cluster nodes.
- MirrorMakerConfig used by MirrorMaker driver.
- ./bin/connect-mirror-maker.sh and ./config/mirror-maker.properties sample configuration.MirrorMaker high-level REST API.
Proposed Changes
Remote Topics, Partitions
...
The MirrorMaker.java driver class and ./bin/connect-mirror-maker.sh script implement a distributed MM2 cluster which does not depend on an existing Connect cluster. Instead, MM2 cluster nodes manage Connect workers internally based on a high-level configuration file and REST API. The configuration file is needed to identify each Kafka cluster. A sample MirrorMakerConfig properties file will be provided in ./config/mirror-maker.properties:
...
Code Block |
---|
name = MirrorSourceConnector connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector source.cluster.alias = primary target.cluster.alias = backup source.cluster.broker.list = localhost:9091 target.cluster.broker.list = localhost:9092 key.converter.class = org.apache.kafka.connect.converters.ByteArrayConverter value.converter.class = org.apache.kafka.connect.converters.ByteArrayConverter |
Generally, a single connector of each type (MirrorSourceConnector, MirrorCheckpointConnector, MirrorHeartbeatConnector) is needed for each source→target flow, so the class name (e.g. MirrorSourceConnector) is used as the connector's "name".
At launch each such connector is configured to replicate no topics or groups (effectively idle), until these properties are configured via a REST API. Alternatively, the MirrorMaker properties file can specify static configuration properties for each connector to avoid using the REST API:
The MirrorMaker properties file can specify static configuration properties for each connector:
Code Block |
---|
clusters = primary, backup
cluster.primary.broker.list = localhost:9091
|
Code Block |
clusters = primary, backup
cluster.primary.broker.list = localhost:9091
cluster.backup.broker.list = localhost:9092
connector.primary->backup.topics = .*
connector.primary->backup.emit.heartbeats = false |
...
property | default value | description |
clusters | required | comma-separated list of Kafka cluster "aliases" |
cluster.cluster.broker.list | required | connection information for the specific cluster |
cluster.cluster.x.y.z | n/a | passed to workers for a specific cluster |
connector.source->target.x.y.z | n/a | passed to a specific connector |
MirrorMaker REST API
To enable remote control of a dedicated MirrorMaker cluster, a high-level REST API is provided. The REST API includes a subset of the full Connect REST API, providing access to the underlying Connectors.
As with the Connect REST API, configurations can be updated via a PUT request:
Code Block |
---|
PUT /from:us-west/to:us-east/connectors/MirrorSourceConnector/config HTTP/1.1
Host: localhost
Accept: application/json
{
"topics": ".*"
} |
The following endpoints are supported:
...
specific cluster | ||
cluster.cluster.x.y.z | n/a | passed to workers for a specific cluster |
connector.source->target.x.y.z | n/a | passed to |
...
a specific connector |
Walkthrough: Running MirrorMaker 2.0
...
In this mode, MirrorMaker does not require an existing Connect cluster. Instead, a high-level driver and API abstract over a manages a collection of Connect workers.
...
Second, launch one or more MirrorMaker cluster nodes:
$ ./bin/connect-mirror-maker.sh mm2.properties
Finally, remote-control the MirrorMaker cluster using the REST API or CLI. The REST API provides a sub-set of the Connect REST API, including the ability to start, stop, and reconfigure connectors. For example:
...
:
$ ./bin/connect-mirror-maker.sh mm2.properties
Running a standalone MirrorMaker connector
...
We could release this as an independent project, but we feel that cluster replication should be one of Kafka's fundamental features.
We could deprecate MirrorMaker but keep it around. However, we see this as a natural evolution of MirrorMaker, not an alternative solution.
We could update MirrorMaker rather than completely rewrite it. However, we'd end up recreating many features provided by the Connect framework, including the REST API, configuration, metrics, worker coordination, etc.
We could build on Uber's uReplicator, which solves some of the problems with MirrorMaker. However, uReplicator uses Apache Helix to provide features that overlap with Connect, e.g. REST API, live configuration changes, cluster management, coordination etc. A native MirrorMaker should only use native parts of Apache Kafka. That said, uReplicator is a major inspiration for the MM2 design.
- We could provide a high-level REST API for remote-controlling a MirrorMaker cluster. However, this has overlapping concerns with Connect.