Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Window function call ::= function_name(arg1, arg2, ... argN) OVER (frame_var AS)? 
( (PARTITION BY expr1, expr2, …. exprN)? (ORDER BY exprA, exprB, … exprN)? Frame_Spec? )

...

Frame_Bound ::= UNBOUNDED PRECEEDINGPRECEDING | expr PRECEEDINGPRECEDING | CURRENT ROW | UNBOUNDED FOLLOWING | expr FOLLOWING

...

Function_name is one of the following:

  • Running aggregate Window function: row-numbercume_dist(), dense_rank(), first_value(), lag(), last_value(), lead(), dense_nth_value(), ntile(), rank(), pecent_rankratio_to_report(), row_number(), ntilepercent_rank()
  • SQL aggregate function: count(), sum(), avg(), min(), max(), etc
  • SQL++ aggregate function: array_count(), array_sum(), strict_count(), strict_sum(), etc

frame_var is a variable which is bound to the frame contents and will be in-scope for function arguments. It contains an array (or multiset multi-set if there’s no order by) of objects where each object has the same structure as the GROUP AS variable object (one field for each variable in the current scope).

...

from emp select array_sum((from w select value w.emp.salary)) over w as (partition by emp.dept

Design

The following components were added:

...

 This expression is for a single window function call.

 Contents:

  • Function name
  • Function arguments
  • List of ‘partition by’ expressions
  • List of ‘order by’ expressions and order modifiers
  • Frame mode: (enum) RANGE | ROWS | GROUPS
  • Frame boundary start, end : (enum) CURRENT_ROW | UNBOUNDED_PRECEDING |  UNBOUNDED_FOLLOWING | BOUNDED_PRECEDING (+ boundary expression) | BOUNDED_FOLLOWING ( + boundary expression)
  • Frame exclusion kind: (enum) CURRENT_ROW | GROUP | TIES | NO_OTHERS
  • Frame variable and its field list mapping
  • Function call expression

Algebricks Operator

org.apache.hyracks.algebricks.core.algebra.operators.logical.WindowOperator

...

Window operator runtime consists of an abstract class org.apache.hyracks.algebricks.runtime.operators.win.AbstractWindowPushRuntime and 3 its subclasses:

  • WindowSimplePushRuntime – used for row_number(), rank(), dense_rank(), and some others – running aggregates that do not require information about partition length

  • WindowMaterializingPushRuntime – used for percencume_rankdist(), ntile(), percent_rank() – running aggregates that require information about partition length

  • WindowNestedPlansPushRuntime – used for regular aggregates – this is the only implementation that supports frame computation

Open items

  • .  

    • WindowNestedPlansUnboundedPushRuntime - optimized version used when the frame is equivalent to the whole partition (unbounded preceding to unbounded following)
    • WindowNestedPlansRunningPushRuntime - optimized version used when the frame is unbounded preceding to current row / n following

Open items

  • Optimize performance of WindowNestedPlansPushRuntime: support reverse aggregation steps for sliding frames (x preceding to y following, etc)

  • Need to implement remaining window functions: lead(), lag(), first_value(), nth_value(), last_value(), cume_dist()

  • Need to create an optimizer rule that combines multiple window operators into a single one if their partition and frame definitions are the same
  • Optimize performance of WindowNestedPlansPushRuntime