Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add information from Harish in HIVE-4797

Windowing and Analytics Functions

Table of Contents

Enhancements to Hive QL

Info
titleVersion

Introduced in Hive version 0.11.

...

  1. Windowing functions
    • LEAD
      • The number of rows to lead can optionally be specified. If the number of rows to lead is not specified, the lead is one row.
      • Returns null when the lead for the current row extends beyond the end of the window.
    • LAG
      • The number of rows to lag can optionally be specified. If the number of rows to lag is not specified, the lag is one row.
      • Returns null when the lag for the current row extends before the beginning of the window.
    • FIRST_VALUE
    • LAST_VALUE
  2. The OVER clause
    • OVER with standard aggregates:
      • COUNT
      • SUM
      • MIN
      • MAX
      • AVG
    • OVER with a PARTITION BY statement with one or more partitioning columns of any primitive datatype.
    • OVER with PARTITION BY and ORDER BY with one or more partitioning and/or ordering columns of any datatype.
      • OVER with a window specification. Windows can be defined separately in a WINDOW clause. Window specifications support these standard options:

        No Format
      • ROWS ((CURRENT ROW) | (UNBOUNDED | [num]) PRECEDING) AND (UNBOUNDED | [num]) FOLLOWING
        
        Note

        The OVER clause supports the following functions, but it does not support a window with them (see HIVE-4797):

        Ranking functions: Rank, NTile, DenseRank, CumeDist, PercentRank.

        Lead and Lag functions.

  3. Analytics functions
    • RANK
    • ROW_NUMBER
    • DENSE_RANK
    • CUME_DIST
    • PERCENT_RANK
    • NTILE

...

PARTITION BY with one partitioning column, no ORDER BY or window specification

No Format

SELECT a, COUNT(b) OVER (PARTITION BY c)
FROM T;

PARTITION BY with two partitioning columns, no ORDER BY or window specification

No Format

SELECT a, COUNT(b) OVER (PARTITION BY c, d)
FROM T;

PARTITION BY with one partitioning column, one ORDER BY column, and no window specification

No Format

SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d)
FROM T;

PARTITION BY with two partitioning columns, two ORDER BY columns, and no window specification

No Format

SELECT a, SUM(b) OVER (PARTITION BY c, d ORDER BY e, f)
FROM T;

PARTITION BY with partitioning, ORDER BY, and window specification

No Format

SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM T;
No Format

SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
FROM T;
No Format

SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)
FROM T;
No Format

SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM T;

WINDOW clause

No Format

SELECT a, SUM(b) OVER w
FROM T;
WINDOW w AS (PARTITION BY c ORDER BY d ROWS UNBOUNDED PRECEDING)

LEAD using default 1 row lead and not specifying default value

No Format

SELECT a, LEAD(a) OVER (PARTITION BY b ORDER BY C ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
FROM T;

LAG specifying a lag of 3 rows and default value of 0

No Format

SELECT a, LAG(a, 3, 0) OVER (PARTITION BY b ORDER BY C ROWS 3 PRECEDING)
FROM T;