Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

POC of the proposed changes: https://github.com/pnowojski/flink/commits/aligned-sources

Motivation

When using event time and watermark and if there is a slight imbalance or data skew on the source level, Flink can enter into a degenerate state, where different source operator instances are way ahead of others in terms of event time when ingesting records. For example due to backpressure combined with data skew, one source instance can be many hours behind another instance. This on its own is not a problem per se. However for downstream operators that are using watermarks to emit some data it can actually become a problem. Because of that such downstream operator (like windowed joins on aggregations) might need buffer excessive amount of data, as the minimal watermark from all of its inputs is held back by the lagging source instance. All records emitted by the non-backpressured sources will hence have to be buffered on the said downstream operator state, which can lead into uncontrollable grow of the operator's state.

...