Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Page properties


Discussion thread
Vote thread
JIRA

Status

Current stateAccepted

Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-145-Support-SQL-windowing-table-valued-function-td45269.html

Vote thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-145-Support-SQL-windowing-table-valued-function-td45571.html

...

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-19604

...

Release1.13


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Windowing table-valued functions

Tumbling Windows

The table-valued function TUMBLE  assigns a window for each row of a relation based on a time attribute columnWe would like to propose 4 kinds of window TVF: tumbling, hopping, cumulative, session. The return value of TUMBLE is a relation window TVF is a relation that includes all columns of data as well as additional two 3 columns named  named window_start and , window_end and , window_time which to indicate the assigned window. The  The window_time column field is the a time attribute of the record after window TVF, it alwsays equal to "window_end - 1". 

Tumbling Windows

The table-valued function TUMBLE  assigns a window for each row of a relation based on a time attribute column. The return value of TUMBLE is a relation that includes all columns of data as well as additional 3 columns named window_start, window_end, window_time to indicate the assigned window. The original row time attribute "timecol" will be a regular timestamp column after The original row time attribute "timecol" will be a regular timestamp column after window TVF. All assigned windows have the same length, and that’s why tumbling sometimes is named as “fixed windowing”.

TUMBLE  that takes three required parameters and one optional parameter:

Code Block
languagesql

TUMBLE(data, DESCRIPTOR(timecol), size [, offset ])

...

The table-valued function HOP assigns windows that cover rows within the interval of size and shifting every slide based on a timestamp column. The return value of HOP is  is a relation that includes all columns of data as well as additional two columns named additional 3 columns named window_start and , window_end and , window_time which time to indicate the assigned window. The window_time column is the time attribute of the record after window TVF, it alwsays equal to "window_end - 1". The  The original row time attribute "timecol" will be a regular timestamp column after window TVF. Windows assigned could have overlapping so hopping sometime is named as “sliding windowing”.

...

The table-valued function CUMULATE assigns windows that cover rows within an initial interval of step size, and expanding to one more step size (keep window start fixed) every step until to the max window size. The return value of CUMULATE is a relation that includes all columns of data as well as additional two columns named 3 columns named window_start and , window_end and , window_time which time to indicate the assigned window. The window_time column is the time attribute of the record after window TVF, it alwsays equal to " window_end - 1". The original row time attribute "timecol" will be a regular timestamp column after window TVF. Windows assigned could have overlapping.

...

The table-valued function SESSION assigns windows that cover rows based on datetime. Within a session window, distances of rows are less than interval. Session window is applied per key. The returned return value of SESSION SE is a relation that includes all columns of data as well as additional three columns named additional 3 columns named window_start and , window_end and , window_time which time to indicate the assigned window. The window_time column is the time attribute of the record after window TVF, it alwsays equal to "window_end - 1". The original row time attribute "timecol" will be a regular timestamp column after window TVF.  

...

SELECT ...
FROM L [LEFT|RIGHT|FULL OUTER] JOIN R -- L and R are relations applied windowing TVF
ON L.window_start = R.window_start AND L.window_end = R.window_end AND ...

...

The existing Grouped window functions, i.e. GROUP BY TUMBLE... are still supported, but will be deprecated. We can drop the old syntax at some point in the future, but this needs another public discussion. We should update the examples, docs to only use the new one. The support for Table API is a future work and should be in a separate FLIP.


Future Work

Support count window with the window TVF

Simplify Polymorphic Table Functions syntax

As disucssed in the mailing list, the current PTF syntax is verbose, it would be great if we can remove TABLE() keyword, that would be easier to be picked up by users. Oracle and SQL Server both don't need such keywords. A simplified query would be like this:


Code Block
languagesql
SELECT *
FROM TUMBLE(inputTable, DESCRIPTOR(timecol), INTERVAL '10' MINUTE);


Support count window with the window TVF

For a For a long time, count window is supported in Table API, but not supported in SQL. With the new window TVF syntax, we can also introduce a new window function for count window in the future. For example, the following TUMBLE_ROW assigns windows in 10 row-count interval. 

Code Block
languagesql
SELECT *
FROM TABLE(
   TUMBLE_ROW(
     data => TABLE inputTable,
     timecol => DESCRIPTOR(timecol),
     size => 10));


Explore more ability of Polymorphic Table Functions

It's very exciting to see the potential ability of PTF.  As we can see, in the future, maybe we can support the following features in SQL in a standard PTF way.

  • advanced operations supported in Table API (FLIP-29), e.g. drop_columns, user-defined-table-aggregate
  • user defined join operator
  • a shortcut TopN function
  • re-assign watermarks?
  • re-partition data, similar feature to Hive DISTRIBUTED BY syntax. 
  • ...