...
Info | ||
---|---|---|
| ||
Grouping sets, CUBE and ROLLUP operators, and the GROUPING__ID function were added in Hive 0.10.0. |
Info | ||
---|---|---|
| ||
GROUPING__ID is compliant with semantics in other SQL engines starting in Hive 2.3.0 (see HIVE-16102). |
For general information about GROUP BY, see GroupBy in the Language Manual.
...
This function returns a bitvector corresponding to whether each column is present or not. For each column, a value of "10" is produced for a row in
the result set if that column has been aggregated in that row, otherwise the value is "01". This can be used to differentiate when there are nulls
in the data.
...
The following query: SELECT key, value, GROUPING__ID, count(*) from T1 GROUP BY key, value WITH ROLLUP
will have the following results.
Column 1 (key) | Column 2 (value) | GROUPING__ID | count(*) |
|
|
|
|
---|---|---|---|---|---|---|---|
NULL | NULL | 03 | 6 | ||||
1 | NULL | 10 | 2 | ||||
1 | NULL | 31 | 1 | ||||
1 | 1 | 30 | 1 | ||||
2 | NULL | 1 | 1 | ||||
2 | 2 | 30 | 1 | ||||
3 | NULL | 10 | 2 | ||||
3 | NULL | 31 | 1 | ||||
3 | 33 | 0 | 1 | ||||
4 | NULL | 1 | 1 | ||||
4 | 5 | 30 | 1 |
Note that the third column is a bitvector of columns being selected.
For the first row, none of the columns are being selected.
For the second row, only the first column is being selectedboth the columns are being selected (and the second column happens to be null), which explains the count of 2value 0.
For the third row, both the columns are being selected (and the second column happens to be null), which explains the count of 1.only the first column is being selected, which explains the value 1.
Grouping function
The grouping function indicates whether an expression in a GROUP BY clause is aggregated or not for a given row. The value 0 represents a column that is part of the grouping set, while the value 1 represents a column that is not part of the grouping set.
Going back to our example above, consider the following query:
SELECT key, value, GROUPING__ID, grouping(key, value), grouping(value, key), grouping(key), count(*) from T1 GROUP BY key, value WITH ROLLUP
This query will produce the following results.
Column 1 (key) | Column 2 (value) | GROUPING__ID | grouping(key, value) | grouping(value, key) | grouping(key) | grouping(value) | count(*) |
---|---|---|---|---|---|---|---|
NULL | NULL | 3 | 3 | 3 | 1 | 1 | 6 |
1 | NULL | 0 | 0 | 0 | 0 | 0 | 2 |
1 | NULL | 1 | 1 | 2 | 0 | 1 | 1 |
1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
2 | NULL | 1 | 1 | 2 | 0 | 1 | 1 |
2 | 2 | 0 | 0 | 0 | 0 | 0 | 1 |
3 | NULL | 0 | 0 | 0 | 0 | 0 | 2 |
3 | NULL | 1 | 1 | 2 | 0 | 1 | 1 |
3 | 3 | 0 | 0 | 0 | 0 | 0 | 1 |
4 | NULL | 1 | 1 | 2 | 0 | 1 | 1 |
4 | 5 | 0 | 0 | 0 | 0 | 0 | 1 |
Cubes and Rollups
The general syntax is WITH CUBE/ROLLUP. It is used with the GROUP BY only. CUBE creates a subtotal of all possible combinations of the set of column in its argument. Once we compute a CUBE on a set of dimension, we can get answer to all possible aggregation questions on those dimensions.
...