Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: drop REST API

Table of Contents

Status

Current stateUnder DiscussionVoting

Discussion thread: here

JIRA: 

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-7500

...

  • Leverages the Kafka Connect framework and ecosystem.

  • Includes both source and sink connectors.

  • Includes a high-level driver that manages connectors in a dedicated cluster.
  • High-level REST API abstracts over connectors between multiple Kafka clusters.
  • Detects new topics, partitions.
  • Automatically syncs topic configuration between clusters.

  • Manages downstream topic ACL.

  • Supports "active/active" cluster pairs, as well as any number of active clusters.

  • Supports cross-datacenter replication, aggregation, and other complex topologies.

  • Provides new metrics including end-to-end replication latency across multiple data centers/clusters.

  • Emits offsets required to migrate consumers between clusters.

  • Tooling for offset translation.

  • MirrorMaker-compatible legacy mode.

...

  • MirrorSourceConnector, MirrorSinkConnector, MirrorSourceTask, MirrorSinkTask classes.

  • MirrorCheckpointConnector, MirrorCheckpointTask.
  • MirrorHeartbeatConnector, MirrorHeartbeatTask.
  • MirrorConnectorConfig, MirrorTaskConfig classes.
  • ReplicationPolicy interface. DefaultReplicationPolicy and LegacyReplicationPolicy classes.

  • Heartbeat and checkpoint topics and associated schemas.

  • RemoteClusterUtils class for querying remote cluster reachability and lag, and for translating consumer offsets between clusters.

  • MirrorMaker driver class with main entry point for running MM2 cluster nodes.

  • MirrorMakerConfig used by MirrorMaker driver.
  • ./bin/connect-mirror-maker.sh and ./config/mirror-maker.properties sample configuration.MirrorMaker high-level REST API.

Proposed Changes

Remote Topics, Partitions

...

The MirrorMaker.java driver class and ./bin/connect-mirror-maker.sh script implement a distributed MM2 cluster which does not depend on an existing Connect cluster. Instead, MM2 cluster nodes manage Connect workers internally based on a high-level configuration file and REST API. The configuration file is needed to identify each Kafka cluster. A sample MirrorMakerConfig properties file will be provided in ./config/mirror-maker.properties:

...

Code Block
name = MirrorSourceConnector
connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector
source.cluster.alias = primary
target.cluster.alias = backup
source.cluster.broker.list = localhost:9091
target.cluster.broker.list = localhost:9092
key.converter.class = org.apache.kafka.connect.converters.ByteArrayConverter
value.converter.class = org.apache.kafka.connect.converters.ByteArrayConverter 

Generally, a single connector of each type (MirrorSourceConnector, MirrorCheckpointConnector, MirrorHeartbeatConnector) is needed for each source→target flow, so the class name (e.g. MirrorSourceConnector) is used as the connector's "name".

At launch each such connector is configured to replicate no topics or groups (effectively idle), until these properties are configured via a REST API. Alternatively, the MirrorMaker properties file can specify static configuration properties for each connector to avoid using the REST API:


The MirrorMaker properties file can specify static configuration properties for each connector:

Code Block
clusters = primary, backup
cluster.primary.broker.list = localhost:9091
Code Block
clusters = primary, backup
cluster.primary.broker.list = localhost:9091
cluster.backup.broker.list = localhost:9092 
connector.primary->backup.topics = .*
connector.primary->backup.emit.heartbeats = false

...

property

default value

description

clusters

required

comma-separated list of Kafka cluster "aliases"


cluster.cluster.broker.list

required

connection information for the specific cluster
cluster.cluster.x.y.zn/apassed to workers for a specific cluster
connector.source->target.x.y.zn/apassed to a specific connector

MirrorMaker REST API

To enable remote control of a dedicated MirrorMaker cluster, a high-level REST API is provided. The REST API includes a subset of the full Connect REST API, providing access to the underlying Connectors.

As with the Connect REST API, configurations can be updated via a PUT request:

Code Block
PUT /from:us-west/to:us-east/connectors/MirrorSourceConnector/config HTTP/1.1
Host: localhost
Accept: application/json

{
	"topics": ".*"
} 

The following endpoints are supported:

...

specific cluster


cluster.cluster.x.y.zn/apassed to workers for a specific cluster
connector.source->target.x.y.zn/apassed to

...

a specific connector


Walkthrough: Running MirrorMaker 2.0

...

In this mode, MirrorMaker does not require an existing Connect cluster. Instead, a high-level driver and API abstract over a manages a collection of Connect workers.

...

Second, launch one or more MirrorMaker cluster nodes:

$ ./bin/connect-mirror-maker.sh mm2.properties

Finally, remote-control the MirrorMaker cluster using the REST API or CLI. The REST API provides a sub-set of the Connect REST API, including the ability to start, stop, and reconfigure connectors. For example:

...

:

$ ./bin/connect-mirror-maker.sh mm2.properties

Running a standalone MirrorMaker connector

...

  • We could release this as an independent project, but we feel that cluster replication should be one of Kafka's fundamental features.

  • We could deprecate MirrorMaker but keep it around. However, we see this as a natural evolution of MirrorMaker, not an alternative solution.

  • We could update MirrorMaker rather than completely rewrite it. However, we'd end up recreating many features provided by the Connect framework, including the REST API, configuration, metrics, worker coordination, etc.

  • We could build on Uber's uReplicator, which solves some of the problems with MirrorMaker. However, uReplicator uses Apache Helix to provide features that overlap with Connect, e.g. REST API, live configuration changes, cluster management, coordination etc. A native MirrorMaker should only use native parts of Apache Kafka. That said, uReplicator is a major inspiration for the MM2 design.

  • We could provide a high-level REST API for remote-controlling a MirrorMaker cluster. However, this has overlapping concerns with Connect.