Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Transparent Kafka cluster addition/removal without Flink job restart.
  2. Transparent Kafka topic addition/removal without Flink job restart.
  3. Direct integration with Hybrid Source.

For use case (2), the current KafkaSource cannot support this since automatic partition removal is not supported. First, users cannot delete topics from Kafka directly since that would break the Flink jobs referring to the deleted topics. Second, users would need to change Kafka Source uid and redeploy with allowNonRestoreState to remove a Kafka topic from Flink subscription. In addition, the mailing list feedback describes difficulties to remove a topic since there is a risk of accidental data loss due to human operations. These issues are highlighted for users who maintain multiple Flink jobs, in especially latency sensitive applications.

This source will extend the KafkaSource to be able to read from multiple Kafka clusters within a single source . In addition, the source can adjust the clusters and topics the source consumes from dynamically, without Flink job restartand introduces an interface to enable more sophisticated automation/coordination between Kafka and Flink infrastructure.

Basic Idea

MultiClusterKafkaSource relies on metadata to determine what clusters and topics to subscribe to, and the metadata can change over time so the source will poll for new metadata and dynamically reconcile changes on an interval.

...

Other required functionality leverages and composes the existing KafkaSource implementation for discovering Kafka topic partition offsets and round robin split assignment per Kafka cluster. The diagram diagrams below depicts how the source will reuse the code of the KafkaSource in order to achieve the requirements.

...