Page History

...

Info

title	Version

Grouping sets, CUBE and ROLLUP operators, and the GROUPING__ID function were added in Hive 0.10.0.
See HIVE-2397, HIVE-3433, HIVE-3471, and HIVE-3613.
Also see HIVE-3552 for an improvement added in Hive 0.11.0.

Info

title	Version

GROUPING__ID is compliant with semantics in other SQL engines starting in Hive 2.3.0 (see HIVE-16102).
Support for SQL grouping function was added in Hive 2.3.0 too (see HIVE-15409).

For general information about GROUP BY, see GroupBy in the Language Manual.

...

This function returns a bitvector corresponding to whether each column is present or not. For each column, a value of "10" is produced for a row in
the result set if that column has been aggregated in that row, otherwise the value is "01". This can be used to differentiate when there are nulls
in the data.

...

The following query: SELECT key, value, GROUPING__ID, count(*) from T1 GROUP BY key, value WITH ROLLUP
will have the following results.

Column 1 (key)	Column 2 (value)	GROUPING__ID	count(*)
NULL	NULL	03	6
1	NULL	10	2
1	NULL	31	1
1	1	30	1
2	NULL	1	1
2	2	30	1
3	NULL	10	2
3	NULL	31	1
3	33	0	1
4	NULL	1	1
4	5	30	1

Note that the third column is a bitvector of columns being selected.
For the first row, none of the columns are being selected.
For the second row, only the first column is being selectedboth the columns are being selected (and the second column happens to be null), which explains the count of 2value 0.
For the third row, both the columns are being selected (and the second column happens to be null), which explains the count of 1.only the first column is being selected, which explains the value 1.

Grouping function

The grouping function indicates whether an expression in a GROUP BY clause is aggregated or not for a given row. The value 0 represents a column that is part of the grouping set, while the value 1 represents a column that is not part of the grouping set.

Going back to our example above, consider the following query:

SELECT key, value, GROUPING__ID, grouping(key, value), grouping(value, key), grouping(key), count(*) from T1 GROUP BY key, value WITH ROLLUP

This query will produce the following results.

Column 1 (key)	Column 2 (value)	GROUPING__ID	grouping(key, value)	grouping(value, key)	grouping(key)	grouping(value)	count(*)
NULL	NULL	3	3	3	1	1	6
1	NULL	0	0	0	0	0	2
1	NULL	1	1	2	0	1	1
1	1	0	0	0	0	0	1
2	NULL	1	1	2	0	1	1
2	2	0	0	0	0	0	1
3	NULL	0	0	0	0	0	2
3	NULL	1	1	2	0	1	1
3	3	0	0	0	0	0	1
4	NULL	1	1	2	0	1	1
4	5	0	0	0	0	0	1

Cubes and Rollups

The general syntax is WITH CUBE/ROLLUP. It is used with the GROUP BY only. CUBE creates a subtotal of all possible combinations of the set of column in its argument. Once we compute a CUBE on a set of dimension, we can get answer to all possible aggregation questions on those dimensions.

...

Space shortcuts

Child pages

Versions Compared

Old Version 31

New Version 32

Key

Grouping function

Cubes and Rollups