Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Column1 (key)

Column2 (value)

1

NULL

1

1

2

2

3

3

3

NULL

4

5

The following query:

Code Block
sql
sql
SELECT key, value, GROUPING__ID, count(*)

...


FROM T1
GROUP BY key, value WITH ROLLUP;

will have the following results.:

Column 1 (key)

Column 2 (value)

GROUPING__ID

count(*)

NULL

NULL

3

6

1

NULL

0

2

1

NULL

1

1

1

1

0

1

2

NULL

1

1

2

2

0

1

3

NULL

0

2

3

NULL

1

1

3

3

0

1

4

NULL

1

1

4

5

0

1

...

This parameter decides if hive should add an additional map-reduce job. If the grouping set
cardinality (4 in the example above), is more than this value, a new MR job is added under the
assumption that the orginal group by will reduce the data size.

 

Grouping__ID function (before Hive 2.3.0)

Grouping__ID function was fixed in Hive 2.3.0, thus behavior before that release is different (this is expected). For each column, the function would return a value of "0" iif that column has been aggregated in that row, otherwise the value is "1".

Hence the following query:

Code Block
sql
sql
SELECT key, value, GROUPING__ID, count(*)
FROM T1
GROUP BY key, value WITH ROLLUP;

will have the following results.

Column 1 (key)

Column 2 (value)

GROUPING__ID

count(*)

NULL

NULL

0

6

1

NULL

1

2

1

NULL

3

1

1

1

3

1

2

NULL

1

1

2

2

3

1

3

NULL

1

2

3

NULL

3

1

3

3

3

1

4

NULL

1

1

4

5

3

1

Note that the third column is a bitvector of columns being selected. 
For the first row, none of the columns are being selected.
For the second row, only the first column is being selected, which explains the count of 2.
For the third row, both the columns are being selected (and the second column happens to be null), which explains the count of 1.