Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state"Under Discussion"

Discussion thread: here (<- link to https://mail-archives.apache.org/mod_mbox/flink-dev/)

JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)


Motivation

As described in FLIP-131, we are aiming at deprecating the DataSet API in favour of the DataStream API and the Table API. After this work is done, the user will be able to write a program using the DataStream API and this will execute efficiently on both bounded and unbounded data. But before we reach this point, it is worth discussing and agreeing on the semantics of some operations as we transition from the streaming world to the batch one. 

...

How to expose that: As a configuration option named execution.scheduling-mode.This will allow users to use it through:

...

  1. When trying to apply processing time windowing on batch workloads, Flink could throw an exception warning the user.
  2. When the user tries to set Processing Time timers, e.g. a ProcessFunction, or specify a processing time-based policy, e.g. a rolling policy in the StreamingFileSink,   Flink should throw an exception. Ideally, we should throw the exception at compile time but at runtime is more realistic.Flink will ignore these timers when executed in batch.
  3. Custom triggers, both in event and processing time, should be treated as optional and potentially be ignored in batch. The reason is that early triggers are simply a means to bypass the previously mentioned correlation between responsiveness and processing time, which does not exist in batch processing.
  4. If event time is the only sensible option for batch, then we may need to consider changing the default value of the TimeCharacteristic from ProcessingTime to EventTime. In this case, if no WatermarkGenerator and TimestampAssigner is set, we can go with INGESTION_TIME semantics.


IMPORTANT PROPOSAL:

If we decide to change the default value of the TimeCharacteristic from ProcessingTime to EventTime and no WatermarkGenerator and TimestampAssigner is set, then we will go with INGESTION_TIME semantics.


Future Work: In the future we may consider adding as options the capability of:

  • firing all the registered processing time timers at the end of a job (at close()) or, ignoring all the registered processing time timers at the end of a job.

These options refer to BOTH batch and streaming and they will make sure that the same job written for streaming can also run for batch.

...