Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

The DSL currently supports windowed aggregations for only two types of time-based window: hopping and tumbling. A third kind of window is defined, but only used in join operations: sliding windows. Users needing sliding window semantics can approximate them with hopping windows where the advance time is 1, but this workaround only artificially resembles the sliding window model; aggregates will be output for every hopping window, of which there will likely be a large number (specifically, the size of the window in milliseconds). The semantics for out-of-order data also differ, as sliding windows have no concept of a grace period and rather represent only "the current result for the current window"

Public Interfaces

Rather than adapt the existing TimeWindows interface (which provides semantics for tumbling and hopping windows), I propose to add a separate SlidingWindows class. This will resemble a stripped-down version of the TimeWindows class, and have only one public method:

...

This is not particularly efficient but I believe it is a good starting point to be improved upon later. "Minimal code changes" are not actually the goal here, rather the hope is to focus on establishing the semantics and behavior as well as the API before moving on to a more complicated design.

PR can be found here

Improved Design

As mentioned above, the initial design is not ideal. The two issues with the simple implementation are that it takes linear time, and requires linear writes to the underlying store. One possibility to improve on this design is as follows:

...