Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In addition, KafkaStream is part of the metadata returned by KafkaMetadataService and contains a logical mapping to physical Kafka clusters and Kafka topics, which can be transient and change dynamically. Changes in metadata are detected by the enumerator and propagated to readers via source events to reconcile the changes. In addition, Kafka clusters are uniquely identified by a string id since there could be multiple bootstrap servers lists that can read a certain Kafka cluster (e.g. kafka-server1:9092, kafka-server2:9092 and kafka-server1:9092).

Consistency Guarantees

KafkaSource guarantees exactly once reading since offsets move forward only when checkpoint succeeds and MultiClusterKafkaSource inherits these properties since the source delegates the functionality from the KafkaSource components. Metadata is checkpointed and can be rebuilt from the reader split state. Exactly once guarantees can be maintained with the assumption that KafkaMetadataService does not decide to expire a cluster in which data still needs to be read. This can be solved by not destroying the old Kafka cluster until consumers are already drained (no more producer traffic and lag is 0)–in practice, a good strategy is to let data expire naturally via Kafka cluster retention. In Kafka migration switchover, the consumer would consume from both old and new clusters. With the regular KafkaSource, if Kafka deletes topic or a cluster is destroyed, the exactly once semantics are not preserved and the semantic is tightly coupled with storage. The design composes and delegates the responsibilities to KafkaSource components so it is limited to whatever KafkaSource can do for exactly once semantics–the KafkaMetadataService and source metadata reconciliation mechanism make it possible to automate migration and prevent data loss.

To the source more user friendly, a MultiClusterKafkaSourceBuilder will be provided (e.g. batch mode should not turn on KafkaMetadataService discovery, should only be done at startup).

...