Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As described above, event time is the only sensible time characteristic for batch. We therefore propose to chagne the default value of the StreamTimeCharacteristic from ProcessingTime to EventTime. This means the DataStream API programs that were using event time before now just work without manually changing this setting. Processing-time programs will also still work, because using setting processing-time timers is not dependent on the StreamTimeCharacteristic. DataStream programs that don't set a TimeStamp assigner TimestampAssigner or WatermarkStrategy will also still work if they don't use operations that don't rely on (event-time) timestamps. This is true for both BATCH and STREAMING execution mode.

...

Incremental updates vs. "final" updates in BATCH vs. STREAM execution mode

some Some of the operations on DataStream have semantics that might make sense for stream processing but should behave differently in BATCH execution mode. For example, KeyedStream.reduce() is essentially a reduce on a GlobalWindow with a Trigger that fires on every element. In data base terms it produces an UPSERT stream as an output: if you get ten input elements for a key you also get ten output records. For batch processing, it makes more sense to instead only produce one output record per key with the result of the aggregation when we reach the end of stream/key. This will be correct for downstream consumers that expect an UPSERT stream but it will change the actual physical output stream that they see. We therefore suggest to change the behaviour of these methods to only emit a final result at the end of input:

...