Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
select_statement UNION [ALL | DISTINCT] select_statement UNION [ALL | DISTINCT] select_statement ...

UNION is used to combine the result from multiple SELECT statements into a single result set.

The default behavior for UNION is that duplicate rows are removed from the result. The optional DISTINCT keyword has no effect other than the default because it also specifies duplicate-row removal. With the optional ALL keyword, duplicate-row removal does not occur and the result includes all matching rows from all the SELECT statements. 

You can mix UNION ALL and UNION DISTINCT in the same query. Mixed UNION types are treated such that a DISTINCT union overrides any ALL union to its left. A DISTINCT union can be produced explicitly by using UNION DISTINCT or implicitly by using UNION with no following DISTINCT or ALL keyword.

Hive currently only supports UNION ALL (bag union), in which duplicates are not eliminated. The number and names of columns returned by each select_statement have to be the same. Otherwise, a schema error is thrown.

...

Info
titleVersion information

In Hive 0.12.0 and earlier releases, unions can only be used within a subquery such as "SELECT * FROM (select_statement UNION ALL select_statement UNION ALL ...) unionResult".

As of Hive 0.13.0, unions can also be used in a top-level query: "select_statement UNION ALL select_statement UNION ALL ...". (See HIVE-6189.)

Before Hive 1.2.0, only UNION ALL (bag union) is supported, in which duplicates are not eliminated. UNION (or UNION DISTINCT) is supported since Hive 1.2.0. (See HIVE-9039)