Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Reuse the existing Source connectors built with FLIP-27 without any change.
  • Support an arbitrary combination of sources to form a hybrid source.

Basic Idea

A hybrid source is a source that contains a list of concrete sources. The hybrid source reads from each contained source in the defined order. It switches from source A to the next source B when source A finishes. So that from the users’ perspective, all the sources act as a single source.

In most cases, the hybrid source only contains two sources, but it can contain more sources if needed.

This FLIP proposes to support switching sources with either predetermined start positions or with position conversion at switch time. The former mode is very simple - sources are configured upfront with their start/end positions and wrapped into HybridSource. No special support is required on existing sources.

With position conversion at switch time the end position of the current source is converted into the start position of the next source. This requires support in the split enumerator of the current source to provide the end position, support in the next source to set the start position (like the start timestamp in KafkaSource) and a user supplied function that converts the end position into the start position.

  1. The HybridSource enumerator manages the process of switching between two sources.
  2. A user provided implementation of SourceFactory is used to create the next Source when the previous Source finishes reading.
  3. The SourceFactory is expected to do the following

...

  1. :
    a. Get the end_position 

...

  1. from the 

...

  1. SplitEnumerator of the

...

  1. previous finished Source. (This may require modification of existing source like FileSource to expose the end position.)
    b. Translate that end_position to the start_position of the next source.
    c. Construct and setup the next Source.

Image Added

Image Removed

A hybrid source is a source that contains a list of concrete sources. The hybrid source reads from each contained source in the defined order. It switches from source A to the next source B when source A finishes. So that from the users’ perspective, all the sources act as a single source.

In most cases, the hybrid source only contains two sources, but it can contain more sources if needed.

The switch is done in the following way:

...