Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The MultiClusterKafkaSource must be able to:

  1. Read from multiple clusters, topics, and partitions.
  2. Assign splits (partitions) from multiple clusters.
  3. Checkpoint splits from multiple clusters.
  4. Report Kafka source metrics for multiple clusters.
  5. Communicate metadata changes via source events from enumerator to readers.
  6. Cleanup resources (e.g. clients, thread pools, metrics) related to old clusters.

The components of the KafkaSource solve solves the requirements 1-4 for a single Kafka cluster and the low level components are can be composed into order to provide the functionality to read from for multiple Kafka clusters. For example, an underlying KafkaSourceEnumerator will be used to discover splits, checkpoint assigned splits, and do periodic partition discovery. Likewise, an underlying KafkaSourceReader will be used to poll and deserialize records, checkpoint split state, and commit offsets back to Kafka.

...