Motivation
Add support for SQL-compatible window functions to SQL++
User Model
Window function call ::= function_name(arg1, arg2, ... argN) OVER (frame_var AS)? ( (PARTITION BY expr1, expr2, …. exprN)? (ORDER BY exprA, exprB, … exprN)? Frame_Spec? )
Frame_Spec ::= ( ROWS | RANGE | GROUPS ) Frame_Extent Frame_Exclusion?
Frame_Extent ::= Frame_Bound | BETWEEN Frame_Bound AND Frame_Bound
Frame_Bound ::= UNBOUNDED PRECEEDING | expr PRECEEDING | CURRENT ROW | UNBOUNDED FOLLOWING | expr FOLLOWING
Frame_Exclusion ::= EXCLUDE ( CURRENT ROW | GROUP | TIES | NO OTHERS )
Function_name is one of the following:
- Running aggregate function: row-number(), rank(), dense_rank(), pecent_rank(), ntile()
- SQL aggregate function: count(), sum(), avg(), min(), max(), etc
- SQL++ aggregate function: array_count(), array_sum(), strict_count(), strict_sum(), etc
frame_var is a variable which is bound to the frame contents and will be in-scope for function arguments. It contains an array (or multiset if there’s no order by) of objects where each object has the same structure as the GROUP AS variable object (one field for each variable in the current scope).
Example:
from emp select array_sum((from w select value w.emp.salary)) over w as (partition by … )
Design
The following components were added:
SQL++ Expression
org.apache.asterix.lang.sqlpp.expression.WindowExpression
This expression is for a single window function call.
Contents:
- List of ‘partition by’ expressions
- List of ‘order by’ expressions and order modifiers
- Frame mode: (enum) RANGE | ROWS | GROUPS
- Frame boundary start, end : (enum) CURRENT_ROW | UNBOUNDED_PRECEDING | UNBOUNDED_FOLLOWING | BOUNDED_PRECEDING (+ boundary expression) | BOUNDED_FOLLOWING ( + boundary expression)
- Frame exclusion kind: (enum) CURRENT_ROW | GROUP | TIES | NO_OTHERS
- Frame variable and its field list mapping
- Function call expression
Algebricks Operator
org.apache.hyracks.algebricks.core.algebra.operators.logical.WindowOperator
This operator can evaluate multiple function calls over the same window definition
Contents:
- List of ‘partition by’ expressions
- List of ‘order by’ expressions and order modifiers
- List of ‘frame start’ expressions (empty if unbounded)
- List of ‘frame end’ expressions (empty if unbounded)
- List of ‘frame value’ expressions
- List of ‘frame exclusion’ expressions (empty if exclusion kind is ‘no others’)
- List of output variables and function call expressions. These are running aggregates such as row_number(), rank() that operate on whole partitions (frame definition is not applicable to these functions)
- Nested plan. These are regular SQL++ aggregates that operate on window frames. Each frame is sent into the nested plan through the nested tuple source operator. SQL aggregate functions are rewritten into SQL++ aggregates by SqlppWindowExpressionVisitor
The logic of the operator is a follows:
- Split input into partitions as specified by ‘partition by’ expressions
- Then order tuples within each partition as specified by ‘order by’ expressions
- Then for each partition
- For each tuple compute its running aggregates
- For each tuple find a window frame and compute regular aggregates by sending that frame into the nested plan
Window frame computation:
For each tuple in the partition we need to find a set of tuples from the same partition that match the frame definition. This is essentially a self-join with the following condition:
‘frame value’ >= ‘frame start’ and ‘frame value’ <= ‘frame end’ and ‘frame value’ != ‘frame exclusion’
Expressions generated for ‘frame value’, ‘frame start’ and ‘frame end’ depend on the frame mode and frame boundary specification:
Frame mode | ‘frame value’ expressions |
---|---|
RANGE | { ‘order by’ expr1, expr2, … exprN } |
ROWS | { row_number() } |
GROUPS | { dense_rank() } |
Boundary kind | ‘frame start’ and ‘frame end’ expressions |
---|---|
CURRENT ROW | { ‘frame value’ expr1, expr2, … exprN } all ‘frame value’ expressions’ |
BOUNDED PRECEDING | { ‘frame value’ expression – ‘boundary’ expression } |
BOUNDED FOLLOWING | { ‘frame value’ expression + ‘boundary’ expression } |
UNBOUNDED PRECEDING | {} (empty) |
UNBOUNDED FOLLOWING | {} (empty) |
‘frame exclusion’ expressions use the same mechanism as ‘frame value’ expressions and are generated as follows:
Exclusion kind | ‘frame exclusion’ expressions |
---|---|
NO OTHERS | {} (empty) |
CURRENT ROW | { row_number() } |
GROUP | { dense_rank() } |
TIES | { dense_rank(), not row_number() } |
Hyracks Runtime
Window operator runtime consists of an abstract class org.apache.hyracks.algebricks.runtime.operators.win.AbstractWindowPushRuntime and 3 subclasses:
WindowSimplePushRuntime – used for row_number(), rank(), dense_rank() – running aggregates that do not require information about partition length
WindowMaterializingPushRuntime – used for percen_rank(), ntile() – running aggregates that require information about partition length
WindowNestedPlansPushRuntime – used for regular aggregates – this is the only implementation that supports frame computation
Open items
Need to implement remaining window functions: lead(), lag(), first_value(), nth_value(), last_value(), cume_dist()
- Need to create an optimizer rule that combines multiple window operators into a single one if their partition and frame definitions are the same
Optimize performance of WindowNestedPlansPushRuntime