Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this FLIP, we propose to optimize performance for the above use-cases by allowing an operator to effectively switch its execution mode from batch to stream based on the "backlog" status of the records emitted by the source operators.

NOTE: this FLIP focuses only on the capability to switch from batch to stream mode. If there is any extra API needed to support switching from stream to batch mode, we will discuss them in a follow-up FLIP.



Behavior changes when switching from batch mode to stream mode

...

  • In mixed mode, the same shuffle strategy as stream mode will be used to set shuffle type between tasks.
  • Keyed inputs input of one input operator will be automatically sorted during isBacklog=true if it doesn't sort the input internally. Multiple
  • Keyed inputs of multiple inputs operator are not automatically sorted. It can sort the input inputs internally.


3) Watermark strategy

...

  • Source operator does not emit watermark while isBacklog=true.
  • At the point when isBacklog switches to false, source operator emits RecordAttributesAt the point when isBacklog switches to false, source operator emits RecordAttributes(isBacklog=false) and the greatest watermark during backlog processing.

  • Source operator emits watermark based on the user-specified WatermarkStrategy and pipeline.auto-watermark-interval while it has not reached EOF.

  • Once the source operator reaches EOF, it emits watermark of Long.MAX_VALUE after it has emitted all records.

...