Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This reader is responsible for reading from 1+ clusters. At startup, the reader will first send a source event to grab the latest metadata from the enumerator before working on the splits (from state if existing). This enables us to filter splits and "remove" invalid splits (e.g. remove a topic partition from consumption). This is also done because it is hard to reason about reader failure during split assignment–the most reliable protocol is for the readers to request metadata at startup.

There will be error handling related to reconciliation exceptions (e.g. KafkaConsumer WakeupException if KafkaSourceReader restarts in the middle of a poll). In addition, restarting enumerators involve releasing resources from underlying thread pools. Furthermore, this enables us to remove topics from KafkaSourceReader processing, since the metadata reconciliation will induce KafkaSourceReader restart in which splits can be filtered according to the current metadata.

...