Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: added links in two syntax sections for HIVE-6348 (3.0.0)

Table of Contents

Order, Sort, Cluster, and Distribute By

This describes the syntax of SELECT clauses ORDER BY, SORT BY, CLUSTER BY, and DISTRIBUTE BY.  See Select Syntax for general information.

Syntax of Order By

The ORDER BY syntax in Hive QL is similar to the syntax of ORDER BY in SQL language except that ORDER BY takes column names only, not column numbers.

Code Block
colOrder: ( ASC | DESC )
colNullOrder: (NULLS FIRST | NULLS LAST)           -- (Note: Available in Hive 2.1.0 and later)
orderBy: ORDER BY colName colOrder? colNullOrder? (',' colName colOrder? colNullOrder?)*
query: SELECT expression (',' expression)* FROM src orderBy

There are some limitations in the "order by" clause. In the strict mode (i.e., hive.mapred.mode=strict), the order by clause has to be followed by a "limit" clause. The limit clause is not necessary if you set hive.mapred.mode to nonstrict. The reason is that in order to impose total order of all results, there has to be one reducer to sort the final output. If the number of rows in the output is too large, the single reducer could take a very long time to finish.

Note that columns are specified by name, not by position number. However in Hive 0.11.0 and later, columns can be specified by position when configured as follows:

The default sorting order is ascending (ASC).

In Hive 2.1.0 and later, specifying the null sorting order for each of the columns in the "order by" clause is supported. The default null sorting order for ASC order is NULLS FIRST, while the default null sorting order for DESC order is NULLS LAST.

In Hive 3.0.0 and later, order by without limit in subqueries and views will be removed by the optimizer. To disable it, set hive.remove.orderby.in.subquery to false.

Syntax of Sort By

The SORT BY syntax is similar to the syntax of ORDER BY in SQL language.

...

Hive uses the columns in SORT BY to sort the rows before feeding the rows to a reducer. The sort order will be dependent on the column types. If the column is of numeric type, then the sort order is also in numeric order. If the column is of string type, then the sort order will be lexicographical order.

In Hive 3.0.0 and later, sort by without limit in subqueries and views will be removed by the optimizer. To disable it, set hive.remove.orderby.in.subquery to false.

Difference between Sort By and Order By

...