Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The performance optimization is a long-term plan. As the window operations are based on windowed data which is an infinite sequence of bounded data. The SLA of such jobs are usually the window size. So we can utilize the duration of window size to buffer data, pre-aggregate them, reduce operation on state, and improve throughput. In the long term, the window operation can work in a continuously batch mode. The source can buffer input data until all data of a specific window has arrived (by watermark), then the source emit the window data to the downstream operators. All the downstream operators are batch operators and may can be scheduled when each window data is ready and don't need the state mechanism. The downstream operators process the bounded window data and emit window result in a batch fashion. That means we can utilize the existing optimizations in batch mode. This approach would be quite efficient. However, this is blocked by several runtime building blocks, such as continuously batch mode, the combination of streaming and batch operators, etc...

...