Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In addition, this FLIP also adds EndOfStreamWindows that can be used with the DataStream API to specify whether the window will end only at the end of its inputs. DataStream program (e.g. coGroup and aggregate) can take advantage of this information to significantly increase throughput and reduce resource utilization.


Objectives

The APIs provided in this FLIP achieves the following objectives / benefits:


1) Improve throughput. For use-case that involves a mixture of bounded and unbounded workloads (e.g. the use-case specified in the example), the user can start a single Flink job to address such use-case where the performance of the bounded workload can be optimal (e.g. similar/higher than the corresponding performance in batch mode).


2) Reduce resource usage.  For use-case that involves an operatorA (with unbounded input) depending on the output of another operatorB, where operatorB only emits results at the end of its input, Flink will deploy operatorB after operatorA is finished. This approach reduces the unnecessary resource usage when operatorA is still processing its inputs.


3) Improve usability. For use-case that needs to invoke DataStream APIs (e.g. KeyedStream#window) with a window assigner that covers all input data, users can use the off-the-shelf EndOfStreamWindows provided in this FLIP, instead of writing tens of lines of code to define this WindowAssigner subclass. 


Public Interfaces

1) Add EndOfStreamWindows which is a subclass of WindowAssigner. This class allows users of the DataStream API to specify whether the computation (e.g. co-group, aggregate) should emit data only after end-of-input.

...

4) A blocking input edge with pending records is same as a source with isBacklog=true when an operator determines its RecordAttributes for downstream nodes.

This is needed in order for this FLIP to work with FLIP-327. More specifically, once both FLIP-327 and FLIP-331 are accepted, we need a way to determine the backlog status for input with blocking edge type.


5) When DataStream#coGroup is invoked with EndOfStreamWindows as the window assigner, Flink should generate an operator with isOutputOnEOF = true.

...