...
- Windowing functions
- LEAD
- The number of rows to lead can optionally be specified. If the number of rows to lead is not specified, the lead is one row.
- Returns null when the lead for the current row extends beyond the end of the window.
- LAG
- The number of rows to lag can optionally be specified. If the number of rows to lag is not specified, the lag is one row.
- Returns null when the lag for the current row extends before the beginning of the window.
- FIRST_VALUE
- LAST_VALUE
- The OVER clause
- OVER with standard aggregates:
- OVER with a PARTITION BY statement with one or more partitioning columns of any primitive datatype.
- OVER with PARTITION BY and ORDER BY with one or more partitioning and/or ordering columns of any datatype.
- Analytics functions
- RANK
- ROW_NUMBER
- DENSE_RANK
- CUME_DIST
- PERCENT_RANK
- NTILE
Distinct support in Hive 2.1.0 and later (see HIVE-9534)
Distinct is supported for aggregation functions including SUM, COUNT and AVG, which aggregate over the distinct values within each partition. Current implementation has the limitation that no ORDER BY or window specification can be supported in the partitioning clause for performance reason. The supported syntax is as follows.
No Format |
---|
COUNT(DISTINCT a) OVER (PARTITION BY c) |
Aggregate functions in OVER clause support in Hive 2.1.0 and later (see HIVE-13475)
Support to reference aggregate functions within the over OVER clause has been added. For instance, currently we can use the SUM aggregation function within the OVER clause as follows.
Code Block |
---|
|
SELECT rank() OVER (ORDER BY sum(b))
FROM T
GROUP BY a; |
Examples
This section provides examples of how to use the Hive QL windowing and analytics functions in SELECT statements. See HIVE-896 for additional examples.
PARTITION BY with one partitioning column, no ORDER BY or window specification
Code Block |
---|
|
SELECT a, COUNT(b) OVER (PARTITION BY c)
FROM T; |
PARTITION BY with two partitioning columns, no ORDER BY or window specification
Code Block |
---|
|
SELECT a, COUNT(b) OVER (PARTITION BY c, d)
FROM T; |
PARTITION BY with one partitioning column, one ORDER BY column, and no window specification
Code Block |
---|
|
SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d)
FROM T; |
PARTITION BY with two partitioning columns, two ORDER BY columns, and no window specification
Code Block |
---|
|
SELECT a, SUM(b) OVER (PARTITION BY c, d ORDER BY e, f)
FROM T; |
PARTITION BY with partitioning, ORDER BY, and window specification
Code Block |
---|
|
SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM T; |
...
Code Block |
---|
|
SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM T; |
There can be multiple OVER
clauses in a single query. A single OVER
clause only applies to the immediately preceding function call. In this example, the first OVER clause applies to COUNT(b) and the second OVER clause applies to SUM(b):
Code Block |
---|
|
SELECT
a,
COUNT(b) OVER (PARTITION BY c),
SUM(b) OVER (PARTITION BY c)
FROM T; |
Aliases can be used as well, with or without the keyword AS:
Code Block |
---|
|
SELECT
a,
COUNT(b) OVER (PARTITION BY c) AS b_count,
SUM(b) OVER (PARTITION BY c) b_sum
FROM T; |
WINDOW clause
Code Block |
---|
|
SELECT a, SUM(b) OVER w
FROM T
WINDOW w AS (PARTITION BY c ORDER BY d ROWS UNBOUNDED PRECEDING); |
LEAD using default 1 row lead and not specifying default value
Code Block |
---|
|
SELECT a, LEAD(a) OVER (PARTITION BY b ORDER BY C ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
FROM T; |
LAG specifying a lag of 3 rows and default value of 0
Code Block |
---|
|
SELECT a, LAG(a, 3, 0) OVER (PARTITION BY b ORDER BY C ROWS 3 PRECEDING)
FROM T; |
Distinct counting for each partition
Code Block |
---|
|
SELECT a, COUNT(distinct a) OVER (PARTITION BY b)
FROM T; |