...
Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).
In order to count the number of distinct users by gender one could write the following query:
...
Multiple aggregations can be done at the same time, however, no two aggregations can have different DISTINCT columns. e.g while For example, the following is possible because count(DISTINCT) and sum(DISTINCT) specify the same column:
Code Block |
---|
INSERT OVERWRITE TABLE pv_gender_agg SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(*), sum(DISTINCT pv_users.userid) FROM pv_users GROUP BY pv_users.gender; |
Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).
However, the following query is not allowed. We don't allow multiple DISTINCT expressions in the same query.
...
Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).
Grouping Sets, Cubes, Rollups, and the GROUPING__ID Function
Info | ||
---|---|---|
| ||
Grouping sets, CUBE and ROLLUP operators, and the GROUPING__ID function were added in Hive release 0.10.0. |
See Enhanced Aggregation, Cube, Grouping and Rollup for information about these aggregation operators.
Also see the JIRAs:
- HIVE-2397 Support with rollup option for group by
- HIVE-3433 Implement CUBE and ROLLUP operators in Hive
- HIVE-3471 Implement grouping sets in Hive
- HIVE-3613 Implement grouping_id function
New in Hive release 0.11.0:
- HIVE-3552 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys