They run in external processes which may experience outages when both Kafka clusters are otherwise healthy and accepting clients
They represent an additional operational burden beyond just running Kafka
They provide logical replication, in which the offset of a record in the source and target are given different offsets
Replication does not preserve offsets of individual records
Replication does not preserve exactly-once-semantics for records & consumer offsetsThe replication is asynchronous, preventing the extension of strong consistency guarantees between Kafka clusters

Goals

Replicate topics and other associated resources between intentionally separate Kafka clustersProvide physical replication semantics
Offsets for replicated records should be the same as in the origin Kafka
Preserve exactly-once-semantics (immediately, in some configurations, or as a later extension of the feature?)for replicated records

Public Interfaces

User Interface

New AdminClient methods for managing Replication Links on both the source and destination clusters
A single cluster can participate in arbitrarily many links, and both a source and destination simultaneously.
Links accept a configuration, including topics.regex and consumer.groups.regex, and a mode selector that accepts either Asynchronous or Synchronous
- Can a link change it's topics.regex or consumer.groups.regex? What happens if a topic or group is created or deleted while the link is in-sync? Does the link change state?
- Can a link change from synchronous to asynchronous?
Existing Consumers & Producers can access replicated topics without an upgrade or reconfiguration.
Metrics allow observation of the state of each partition included in the replication link and progress of the replication flow (lag, throughput, etc)
During and shortly after network partitions, the link will be out-of-sync while it catches up with the source ISR
Links can be temporarily or permanently disconnected. During temporary disconnects, the replicated topic is not writable. After a permanent disconnect, the replicated topic is writable.manually disconnected, after which the destination topic becomes writable.
- Can we manually reverse a link while keeping consistency?
- Can we reconnect a link, truncating the target if it has diverged?

Data Semantics

Cross-Cluster Replication is similar to Intra-Cluster replication, as both cross-cluster topics and intra-cluster replicas:

...

Are subject to the target cluster's ACL environment
Are not included in acks=all or min-ISR requirements (optional? maybe acks=all is useful/required for EOS producers that want to make sure cross-cluster replication finishes)Are not eligible for fetch-from-follower on the source clustereligible for fetches from source cluster consumers
Have a separate topic-id

Consumer Offsets

...

Replication Mode

Asynchronous mode allows replication of a record to proceed after the source cluster has ack'd the record to the producer.
- Maintains existing latency of source producers, but provides only single-topic consistency guarantees
Synchronous mode requires the source cluster to delay acks for a producer until after the record has been replicated to all ISRs for all attached synchronous replication links.
- Also applies to consumer group offsets submitted by the consumer or transactional producer
- Increases latency of source producers, in exchange for multi-topic & consumer-offset consistency guarantees

Partition Behavior

We must support partition tolerance, which requires choosing between consistency and availability. Availability in the table below means that the specified client can operate, at the expense of consistency with another client. Consistency means that the specified client will be unavailable, in order to be consistent with another client.

Mode	Asynchronous Mode			Synchronous Mode
Link State	Startup	Out-of-sync	Disconnected	Startup	Out-of-sync	Disconnected
Source Consumers	Available¹	Available¹	Available^1,2	Available¹	Available¹	Available^1,2
Source Non-Transactional Producers	Available¹	Available¹	Available^1,2	Available¹	Available¹	Available^1,2
Source Transactional Producers	Available³	Available⁴	Available²	Available³	Consistent⁵	Available²
Target Consumers	Available⁶	Available⁶	Available²	Consistent⁷	Consistent⁸	Available²
Target Producers	Consistent⁹	Consistent⁹	Available²	Consistent⁹	Consistent⁹	Available²

Source clients not involved in transactions will always prioritize availability over consistency
When the link is permanently disconnected, clients on each cluster are not required to be consistent and can show whatever the state was when the link was disconnected.
Transactional clients are available during link startup, to allow creating links on transactional topics without downtime.
Transactional clients are available when asynchronous links are out-of-sync, because the source cluster is allowed to ack transactional produces while async replication is offline or behind.
Transactional clients are consistent when synchronous links are out-of-sync, because the destination topic is readable by target consumers. Transactional produces to a topic with an out-of-sync synchronous replication link should timeout or fail.
Consumers of an asynchronous replicated topic will see partial states while the link is catching up
While starting up a synchronous replicated topic, consumers will be unable to read the partial topic. This allows source transactional produces to proceed while the link is starting (3)
Consumers of a synchronous replicated topic should always see the same contents as the source topic
Producers targeting the replicated topic will fail because the topic is not writable until it is disconnected. Allowing a dual-write to the replicated topic would be inconsistent.

...

Remote and Local Logs

Remote log replication should be controlled by the second tier's provider.
Remote logs can be referenced if not replicated by the second tier's provider so that replicated Kafka topics reference the source remote log storage.

...

The network path between Kafka clusters is assumed to be less reliable have less uptime, lower bandwidth, and higher latency than the intra-cluster network, and have more strict routing constraints.
Network connections for replication links can proceed either source→target or target→source to allow one cluster to be behind a NAT (can we support NAT punching as a first-class feature, to allow both clusters to be behind NAT?)
Allow for simple traffic separation of cross-cluster connections and client/intra-cluster connections (do we need to connect to something other than the other cluster's advertised listeners?)

...

Binary log format
The network protocol and api behavior
Any class in the public packages under clientsConfiguration, especially client configuration
- org/apache/kafka/common/serialization
- org/apache/kafka/common
- org/apache/kafka/common/errors
- org/apache/kafka/clients/producer
- org/apache/kafka/clients/consumer (eventually, once stable)
Monitoring
Command line tools and arguments
Anything else that will likely break existing users in some way when they upgrade

User Stories

Disaster Recovery (multi-zone Asynchronous)

I administrate multiple Kafka clusters in different availability zones
I have a performance-sensitive application that reads in all zones but writes to only one zone at a time. For example, an application that runs consumers in zones A and B to keep caches warm but disables producers in zone B while zone A is running.
I set up an asynchronous Cross-Cluster replication link for my topics and consumer groups from cluster A to cluster B. While the link is being created, applications in zone A are performant, and zone B can warm it's caches with historical data as it is replicated.
I do the same with cluster A and cluster C (and others that may exist)
When zone A goes offline, I want the application in zone B to start writing. I manually disconnect the A→B cross-cluster link, and trigger the application in zone B to begin writing.
What happens to cluster C? Can we connect B→C quickly? What happens if C is ahead of B and truncating C breaks the cluster C consumers?
When zone A recovers, I see that the history has diverged between zone A and B. I manually delete the topics in zone A and re-create the replication link in the opposite direction.

Proposed Changes

Describe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.

...

Both the source and target clusters should have a version which includes the Cross-Cluster Replication feature
Clusters which support the Cross-Cluster Replication feature should negotiate on the mutually-supported replication semantics
If one of the clusters is downgraded to a version which does not support Cross-Cluster Replication, the partner cluster should temporarily disable replication's link should fall out-of-sync.

Test Plan

Describe in few sentences how the KIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

...

Space shortcuts

Child pages

Versions Compared

Old Version 3

New Version 4

Key

Goals

Public Interfaces

User Interface

Data Semantics

User Stories

Disaster Recovery (multi-zone Asynchronous)

Proposed Changes

Test Plan

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 3

New Version 4

Key

Goals

Public Interfaces

User Interface

Data Semantics

User Stories

Disaster Recovery (multi-zone Asynchronous)

Proposed Changes

Test Plan