Status

Current state: Under Discussion

Discussion thread: https://lists.apache.org/thread/zmpnzx6jjsqc0oldvdm5y2n674xzc3jc

JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)

Released: <Flink Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Shortly describe the system that your connector will connect to. In case there is already a connector available, please explain why you're introducing either a new one or an improvement via this FLIP.

This source will extend the KafkaSource to be able to read from multiple Kafka clusters within a single source.

Some of the challenging use cases that these features solve are:

Transparent Kafka cluster migration without Flink job restart.
Transparent Kafka topic migration without Flink job restart.
Direct integration with Hybrid Source.

Public Interfaces

Please include the Flink interfaces that you will use when implementing your connector. You most likely will use one or more of these:

Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this connector.

Proposed Changes

Describe the new connector you are proposing in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change. Please include details on implementation including what destinations your connector will support, batch and stream support, async or synchronous connector implementation, etc.

We aim to have connectors that meet the following criteria (where applicable):

Source and Sink
Supports both Bounded (Batch) and Unbounded (Streaming)
Usable in both DataStream and Table API/SQL

Compatibility, Deprecation, and Migration Plan

The source is opt in and would requirement users to implement code changes. It is possible for the MultiClusterKafkaSource builder to make it easier to convert, code wise.

In the same vein as the migration from FlinkKafkaConsumer and KafkaSource, the state is incompatible between KafkaSource and MultiClusterKafkaSource so we would recommend to reset all state or reset partial state by setting a different uid and starting the application from nonrestore state.

What impact (if any) will there be on existing users? The source is opt in and would need to be
If we are changing behavior how will we phase out the older behavior?
If we need special migration tools, describe them here.
When will we remove the existing behavior?

Test Plan

This will be tested by unit and integration tests. The work will extend existing KafkaSource test utilities in Flink to exercise multiple clusters.

Rejected Alternatives

None

Page tree

FLIP-246: Multi Cluster Kafka Source