Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Page properties


Discussion thread
Vote thread
JIRA

Status

Current stateAccepted

Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-145-Support-SQL-windowing-table-valued-function-td45269.html

Vote thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-145-Support-SQL-windowing-table-valued-function-td45571.html

...

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-19604

...

Release1.13


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

This requires the parser support of Calcite. Fortunately, Calcite v1.25.0 has already supports tumbling and hopping windowing TVF. We need to extend it to support one more window: cumulative window.

Windowing table-valued functions

We would like to propose 4 kinds of window TVF: tumbling, hopping, cumulative, session. The return value of window TVF is a relation that includes all columns of data as well as additional 3 columns named window_start, window_end, window_time to indicate the assigned window. The window_time field is a time attribute of the record after window TVF, it alwsays equal to support one more window: cumulative window.

Windowing table-valued functions

"window_end - 1"

Tumbling Windows

The table-valued function TUMBLE  assigns a window for each row of a relation based on a time attribute column. The return value of TUMBLE is a relation that includes all columns of data as well as additional two columns named 3 columns named window_start and , window_end and , window_time which  to indicate the assigned window. The window_time column is the time attribute of the record after window TVF, it alwsays equal to "window_end - 1". The  The original row time attribute "timecol" will be a regular timestamp column after window TVF. All assigned windows have the same length, and that’s why tumbling sometimes is named as “fixed windowing”.

TUMBLE  that takes three required parameters and one optional parameter:

Code Block
languagesql

TUMBLE(data, DESCRIPTOR(timecol), size [, offset ])

...

The table-valued function HOP assigns windows that cover rows within the interval of size and shifting every slide based on a timestamp column. The return value of HOP is  is a relation that includes all columns of data as well as additional two columns named additional 3 columns named window_start and window_end and window_time which indicate the assigned window. The window_time column is the time attribute of the record after window TVF, it alwsays equal to " window_end - 1". The , window_time to indicate the assigned window. The original row time attribute "timecol" will be a regular timestamp column after window TVF. Windows assigned could have overlapping so hopping sometime is named as “sliding windowing”.

...

The table-valued function CUMULATE assigns windows that cover rows within an initial interval of step size, and expanding to one more step size (keep window start fixed) every step until to the max window size. The return value of CUMULATE is a relation that includes all columns of data as well as additional two columns named 3 columns named window_start and , window_end and , window_time which time to indicate the assigned window. The window_time column is the time attribute of the record after window TVF, it alwsays equal to "window_end - 1". The original row time attribute "timecol" will be a regular timestamp column after window TVF. Windows assigned could have overlapping.

...

The table-valued function SESSION assigns windows that cover rows based on datetime. Within a session window, distances of rows are less than interval. Session window is applied per key. The returned return value of SESSION SE is a relation that includes all columns of data as well as additional three columns named additional 3 columns named window_start and , window_end and , window_time which time to indicate the assigned window. The window_time column is the time attribute of the record after window TVF, it alwsays equal to " window_end - 1". The original row time attribute "timecol" will be a regular timestamp column after window TVF.  

...

SELECT ...
FROM L [LEFT|RIGHT|FULL OUTER] JOIN R -- L and R are relations applied windowing TVF
ON L.window_start = R.window_start AND L.window_end = R.window_end AND ...

...

Explore more ability of Polymorphic Table Functions

It's very exciting to see the potential ability of PTF.  As we can see, in the future, maybe we can support the following features in SQL in a standard PTF way.

  • advanced operations supported in Table API (FLIP-29), e.g. drop_columns, user-defined-table-aggregate
  • user defined join operator
  • a shortcut TopN function
  • re-assign watermarks?
  • re-partition data, similar feature to Hive DISTRIBUTED BY syntax. 
  • ...