Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this section, we provide benchmark results to compare the throughput between batch mode and stream mode with and without backlog processing optimization for commonly-used operations . We hope this can help us quantify the increase in throughput after we are able to use batch mode to process backlog records.during backlog processing. This will demonstrate the performance we can gain with the optimization during backlog processing.


1) Use KeyedStream#reduce to process backlog records.

The ReduceBacklogBenchmark benchmark on a macbook with the latest Flink 1.19-snapshot and parallelism=1. RocksDB is used in streaming mode.

Here are the benchmark results:

  • Without the proposed change, in stream mode, the source generates 1*10^7 backlog records, average execution time is 28.8 sec.
  • Without the proposed change, in batch mode, the source generates 1*10^7 backlog records, average execution time is 11.1 sec.
  • With the proposed change, in stream mode, the source generates 1*10^7 backlog records, average execution time is 11.6 sec.

This shows that with the proposed change above, the KeyedStream#reduce can be 2.5X faster than its current throughput in stream mode during backlog processing. And it is very close to the throughput in batch mode, with only 4.3% less throughput.



21) Use DataStream#coGroup to process records from two bounded streams and emit results after both inputs have ended.

...

Compatibility, Deprecation, and Migration Plan

...

  • All the keyed one-input operators can automatically benefit from the optimization during backlog.
  • For multi-input operators, its performance stays the same as before. In order to optimize performance during backlog, users have to update the operator to make use of the isBacklog attribute

...

  • .