Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add version & jiras for cube, rollup, etc.; fix count ; edit for clarity

...

Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(star)(*).

In order to count the number of distinct users by gender one could write the following query:

...

Multiple aggregations can be done at the same time, however, no two aggregations can have different DISTINCT columns. e.g while For example, the following is possible because count(DISTINCT) and sum(DISTINCT) specify the same column:

Code Block
  INSERT OVERWRITE TABLE pv_gender_agg
  SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(*), sum(DISTINCT pv_users.userid)
  FROM pv_users
  GROUP BY pv_users.gender;

Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(star)(*).

However, the following query is not allowed. We don't allow multiple DISTINCT expressions in the same query.

...

Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(star)(*).

Grouping Sets, Cubes, Rollups, and the GROUPING__ID Function

Info
titleVersion

Grouping sets, CUBE and ROLLUP operators, and the GROUPING__ID function were added in Hive release 0.10.0.

See Enhanced Aggregation, Cube, Grouping and Rollup for information about these aggregation operators.

Also see the JIRAs:

  • HIVE-2397 Support with rollup option for group by
  • HIVE-3433 Implement CUBE and ROLLUP operators in Hive
  • HIVE-3471 Implement grouping sets in Hive
  • HIVE-3613 Implement grouping_id function

New in Hive release 0.11.0:

  • HIVE-3552 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys