Status

Current state: Under Discussion

Discussion thread: https://lists.apache.org/thread/zmpnzx6jjsqc0oldvdm5y2n674xzc3jc

JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)

Released: <Flink Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

This source will extend the KafkaSource to be able to read from multiple Kafka clusters within a single source. In addition, the source can adjust the clusters and topics the source consumes from dynamically, without Flink job restart.

Some of the challenging use cases that these features solve are:

Transparent Kafka cluster addition/removal without Flink job restart.
Transparent Kafka topic addition/removal without Flink job restart.
Direct integration with Hybrid Source.

Public Interfaces

The source will use the FLIP-27: Refactor Source Interface to integrate it with Flink.

This proposal does not include any changes to existing public interfaces. A new MultiClusterKafkaSource builder will serve as the public API and all other APIs will be marked as Internal in this proposal.

Proposed Changes

Describe the new connector you are proposing in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change. Please include details on implementation including what destinations your connector will support, batch and stream support, async or synchronous connector implementation, etc.

We aim to have connectors that meet the following criteria (where applicable):

Source and Sink
Supports both Bounded (Batch) and Unbounded (Streaming)
Usable in both DataStream and Table API/SQL

Compatibility, Deprecation, and Migration Plan

The source is opt in and would require users to implement code changes.

In the same vein as the migration from FlinkKafkaConsumer and KafkaSource, the source state is incompatible between KafkaSource and MultiClusterKafkaSource so it is recommended to reset all state or reset partial state by setting a different uid and starting the application from nonrestore state.

Test Plan

This will be tested by unit and integration tests. The work will extend existing KafkaSource test utilities in Flink to exercise multiple clusters.

Rejected Alternatives

None

Page tree

FLIP-246: Multi Cluster Kafka Source