Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Remove broadcast state section

...

As describe above, we think timeWindow() is not a useful operation and therefore propose to deprecate and eventually remove it. The operation can have surprising behaviour and users should use explicit process-time or event-time operations.

Broadcast State

TODO: Do we need to cover this here?

A powerful feature of the DataStream API is the Broadcast State pattern. This feature/pattern was introduced to allow users to implement use-cases where a “control” stream needs to be broadcasted to all downstream tasks, and its broadcasted elements, e.g. rules, need to be applied to all incoming elements from another stream. An example can be a fraud detection use-case where the rules evolve over time.

In this pattern, Flink provides no guarantees about the order in which the inputs are read. Essentially, there is no way to force the broadcast side to be read before the non-broadcast side. Use-cases like the one above make sense in the streaming world where jobs are expected to run for a long period of time with input data that are not known in advance. In these settings, requirements may change over time depending on the incoming data. 

In the batch world though, we believe that such use-cases do not make much sense, as the input (both the elements and the control stream) are expected to be static and known in advance. 

Proposal (potentially not in 1.12): Build custom support for broadcast state pattern where the broadcast side is read first.

Incremental updates vs. "final" updates in BATCH vs. STREAM execution mode

...