Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Discussion threadhere

JIRAhere

Released:  ...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

The proposal is to change the semantics of punctuate() to be purely "time driven", instead of "part time driven, and part data-driven". That is, the punctuate function triggering will no longer be dependent whether there are new data arriving, and hence not on the timestamps of the arriving data either. Instead it will be triggered only by system wall-clock time.

 

As for users, the programming pattern would be: 

  1. If you need to add a pure time-driven computation logic, use punctuate().
  2. If you need to add a data-driven computation logic, you should always use process(), and in process() users can choose to trigger some specific logic only every some time units, but still when a new data has arrived and hence being processed. 

The above approach will have consequences for implementations built against the current semantics.

 

 

 

A non-invasive alternative would be to leave current semantics as-is but allow users to provide a function determining the timestamp of the stream task. In a similar way to how the TimestampExtractor allows users to decide what the current timestamp is for a given message (event-time, system-time or other), this feature would allow users to decide what the current timestamp is for a given stream task irrespective of the arrival of messages to all of the input partitions. This approach is less preferable to the above due to the extra complexity, however allows for backward compatibility.

 

 

 

Yet another non-invasive alternative could be leave current semantics as the defaults for the puncutate method but allow configuration switch between event and system time.

...