Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info
titleVersion

Grouping sets, CUBE and ROLLUP operators, and the GROUPING__ID function were added in Hive 0.10.0.
See HIVE-2397, HIVE-3433, HIVE-3471, and HIVE-3613.
Also see HIVE-3552 for an improvement added in Hive 0.11.0.

Info
titleVersion

GROUPING__ID is compliant with semantics in other SQL engines starting in Hive 2.3.0 (see HIVE-16102).
Support for SQL grouping function was added in Hive 2.3.0 too (see HIVE-15409).

For general information about GROUP BY, see GroupBy in the Language Manual.

...

This function returns a bitvector corresponding to whether each column is present or not. For each column, a value of "10" is produced for a row in
the result set if that column has been aggregated in that row, otherwise the value is "01". This can be used to differentiate when there are nulls
in the data.

...

The following query: SELECT key, value, GROUPING__ID, count(*) from T1 GROUP BY key, value WITH ROLLUP
will have the following results.

Column 1 (key)

Column 2 (value)

GROUPING__ID

count(*)

 

 

 

 

NULL

NULL

03

6

1

NULL

10

2

1

NULL

31

1

1

1

30

1

2

NULL

1

1

2

2

30

1

3

NULL

10

2

3

NULL

31

1

3

33

0

1

4

NULL

1

1

4

5

30

1

Note that the third column is a bitvector of columns being selected.
For the first row, none of the columns are being selected.
For the second row, only the first column is being selectedboth the columns are being selected (and the second column happens to be null), which explains the count of 2value 0.
For the third row, both the columns are being selected (and the second column happens to be null), which explains the count of 1.only the first column is being selected, which explains the value 1.

Grouping function

The grouping function indicates whether an expression in a GROUP BY clause is aggregated or not for a given row. The value 0 represents a column that is part of the grouping set, while the value 1 represents a column that is not part of the grouping set. 

Going back to our example above, consider the following query:

 SELECT key, value, GROUPING__ID, grouping(key, value), grouping(value, key), grouping(key), count(*) from T1 GROUP BY key, value WITH ROLLUP

This query will produce the following results.

Column 1 (key)

Column 2 (value)

GROUPING__ID

grouping(key, value)grouping(value, key)

grouping(key)

grouping(value)count(*)

NULL

NULL

3

33

1

16

1

NULL

0

00

0

02

1

NULL

1

12

0

11

1

1

0

00

0

01

2

NULL

1

12

0

11

2

2

0

00

0

01

3

NULL

0

00

0

02

3

NULL

1

12

0

11

3

3

0

00

0

01

4

NULL

1

12

0

11

4

5

0

00

0

01


Cubes and Rollups

The general syntax is WITH CUBE/ROLLUP. It is used with the GROUP BY only. CUBE creates a subtotal of all possible combinations of the set of column in its argument. Once we compute a CUBE on a set of dimension, we can get answer to all possible aggregation questions on those dimensions.

...