...
If the optimizations proposed in this FLIP is enabled on an operator, its downstream operators would be able to reduce the computation, storage and network resources needed to process the intermediate output results generated by the operator. Suppose the operator used to generate N output records for each key between a max-flush-interval on average, the downstream operators now would only need 1/N of the original resources to handle the incoming data stream, or increase its throughput by N times with the same resource set.
On the other hand, this operator feature would increate the incur the following downside / overhead:
- The end-to-end latency of
...
- intermediate results will be higher. This is because each operator might buffer the intermediate results for up to the configured interval before emitting the intermediate result. This will negatively affect the data freshness of the job's output.
- State size will be larger. This is because the corresponding operators will need to buffer / merge output records by key before emitting those outputs periodically.
- There is higher state backend access overhead. Each buffering operation would bring a read & write state access. Note that this overhead can be mitigated with the LRU cache introduced in FLIP-325.
Built-in operators and functions that would be affected by this FLIP
...