Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Remove ampersand from page title because it causes generic page URL when surfed into from the Language Manual.

Windowing and Analytics Functions

Table of Contents

Enhancements to Hive QL

Info
titleVersion

Introduced in Hive version 0.11.

...

  1. Windowing functions
    • LEAD
      • The number of rows to lead can optionally be specified. If the number of rows to lead is not specified, the lead is one row.
      • Returns null when the lead for the current row extends beyond the end of the window.
    • LAG
      • The number of rows to lag can optionally be specified. If the number of rows to lag is not specified, the lag is one row.
      • Returns null when the lag for the current row extends before the beginning of the window.
    • FIRST_VALUE
    • LAST_VALUE
  2. The OVER clause
    • OVER with standard aggregates:
      • COUNT
      • SUM
      • MIN
      • MAX
      • AVG
    • OVER with a PARTITION BY statement with one or more partitioning columns of any primitive datatype.
    • OVER with PARTITION BY and ORDER BY with one or more partitioning and/or ordering columns of any datatype.
    • OVER with a window specification. Windows can be defined separately in a WINDOW clause. Window specifications support these standard options:
      Code Block
      ROWS ((CURRENT ROW) | (UNBOUNDED | [num]) PRECEDING) AND (UNBOUNDED | [num]) FOLLOWING
      
  3. Analytics functions
    • RANK
    • ROW_NUMBER
    • DENSE_RANK
    • CUME_DIST
    • PERCENT_RANK
    • NTILE

Examples

This section provides examples of how to use the Hive QL windowing and analytics functions in SELECT statements. See HIVE-896 for additional examples.

PARTITION BY with one partitioning column, no ORDER BY or window specification

Code Block
SELECT a, COUNT(b) OVER (PARTITION BY c)
FROM T;

PARTITION BY with two partitioning columns, no ORDER BY or window specification

Code Block
SELECT a, COUNT(b) OVER (PARTITION BY c, d)
FROM T;

PARTITION BY with one partitioning column, one ORDER BY column, and no window specification

Code Block
SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d)
FROM T;

PARTITION BY with two partitioning columns, two ORDER BY columns, and no window specification

Code Block
SELECT a, SUM(b) OVER (PARTITION BY c, d ORDER BY e, f)
FROM T;

PARTITION BY with partitioning, ORDER BY, and window specification

Code Block
SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM T;

...

Code Block
SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM T;

WINDOW clause

Code Block
SELECT a, SUM(b) OVER w
FROM T;
WINDOW w AS (PARTITION BY c ORDER BY d ROWS UNBOUNDED PRECEDING)

LEAD using default 1 row lead and not specifying default value

Code Block
SELECT a, LEAD(a) OVER (PARTITION BY b ORDER BY C ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
FROM T;

LAG specifying a lag of 3 rows and default value of 0

Code Block
SELECT a, LAG(a, 3, 0) OVER (PARTITION BY b ORDER BY C ROWS 3 PRECEDING)
FROM T;