Windowing and Analytics Functions
Enhancements to Hive QL
Info |
---|
|
Introduced in Hive version 0.11. |
...
- Windowing functions
- LEAD
- The number of rows to lead can optionally be specified. If the number of rows to lead is not specified, the lead is one row.
- Returns null when the lead for the current row extends beyond the end of the window.
- LAG
- The number of rows to lag can optionally be specified. If the number of rows to lag is not specified, the lag is one row.
- Returns null when the lag for the current row extends before the beginning of the window.
- FIRST_VALUE
- LAST_VALUE
- The OVER clause
- OVER with standard aggregates:
- OVER with a PARTITION BY statement with one or more partitioning columns of any primitive datatype.
- OVER with PARTITION BY and ORDER BY with one or more partitioning and/or ordering columns of any datatype.
- OVER with a window specification. Windows can be defined separately in a WINDOW clause. Window specifications support these standard options:
Code Block |
---|
ROWS ((CURRENT ROW) | (UNBOUNDED | [num]) PRECEDING) AND (UNBOUNDED | [num]) FOLLOWING
|
- Analytics functions
- RANK
- ROW_NUMBER
- DENSE_RANK
- CUME_DIST
- PERCENT_RANK
- NTILE
Examples
This section provides examples of how to use the Hive QL windowing and analytics functions in SELECT statements. See HIVE-896 for additional examples.
PARTITION BY with one partitioning column, no ORDER BY or window specification
Code Block |
---|
SELECT a, COUNT(b) OVER (PARTITION BY c)
FROM T;
|
PARTITION BY with two partitioning columns, no ORDER BY or window specification
Code Block |
---|
SELECT a, COUNT(b) OVER (PARTITION BY c, d)
FROM T;
|
PARTITION BY with one partitioning column, one ORDER BY column, and no window specification
Code Block |
---|
SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d)
FROM T;
|
PARTITION BY with two partitioning columns, two ORDER BY columns, and no window specification
Code Block |
---|
SELECT a, SUM(b) OVER (PARTITION BY c, d ORDER BY e, f)
FROM T;
|
PARTITION BY with partitioning, ORDER BY, and window specification
Code Block |
---|
SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM T;
|
...
Code Block |
---|
SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM T;
|
WINDOW clause
Code Block |
---|
SELECT a, SUM(b) OVER w
FROM T;
WINDOW w AS (PARTITION BY c ORDER BY d ROWS UNBOUNDED PRECEDING)
|
LEAD using default 1 row lead and not specifying default value
Code Block |
---|
SELECT a, LEAD(a) OVER (PARTITION BY b ORDER BY C ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
FROM T;
|
LAG specifying a lag of 3 rows and default value of 0
Code Block |
---|
SELECT a, LAG(a, 3, 0) OVER (PARTITION BY b ORDER BY C ROWS 3 PRECEDING)
FROM T;
|