...
Hive is designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data. It provides a simple query language called Hive QL, which is based on SQL and which enables users familiar with SQL to do ad-hoc querying, summarization and data analysis easily. At the same time, Hive QL also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.
What Hive is NOT
...
Hadoop is a batch processing system and Hadoop jobs tend to have high latency and incur substantial overheads in job submission and scheduling. As a result - latency for Hive queries is generally very high (minutes) even when data sets involved are very small (say a few hundred megabytes). As a result it cannot be compared with systems such as Oracle where analyses are conducted on a significantly smaller amount of data but the analyses proceed much more iteratively with the response times between iterations being less than a few minutes. Hive aims to provide acceptable (but not optimal) latency for interactive data browsing, queries over small data sets or test queries.
...
Operator | Operand types | Description | ||
---|---|---|---|---|
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="e84e2a3241cfc98b-6c8ec23c-44ed430f-9c828317-76345ee9eab2b1d24e25814c"><ac:plain-text-body><![CDATA[ | A[n] | A is an Array and n is an int | returns the nth element in the array A. The first element has index 0 e.g. if A is an array comprising of ['foo', 'bar'] then A[0] returns 'foo' and A[1] returns 'bar' | ]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="e9063060e880f50b-82e46b1d-418d454c-a1828e79-d54b633ec8e04405a174e7c3"><ac:plain-text-body><![CDATA[ | M[key] | M is a Map<K, V> and key has type K | returns the value corresponding to the key in the map e.g. if M is a map comprising of {'f' -> 'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar' | ]]></ac:plain-text-body></ac:structured-macro> |
S.x | S is a struct | returns the x field of S e.g for struct foobar {int foo, int bar} foobar.foo returns the integer stored in the foo field of the struct. |
...
Return Type | Aggregation Function Name (Signature) | Description | ||
---|---|---|---|---|
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="130cf6aa89eb55ca-92c7350b-437e41d3-837790c7-984301b5764d13ac7ccbffca"><ac:plain-text-body><![CDATA[ | BIGINT | count(*), count(expr), count(DISTINCT expr[, expr_.]) | count(*) - Returns the total number of retrieved rows, including rows containing NULL values; count(expr) - Returns the number of rows for which the supplied expression is non-NULL; count(DISTINCT expr[, expr]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL. | ]]></ac:plain-text-body></ac:structured-macro> |
DOUBLE | sum(col), sum(DISTINCT col) | returns the sum of the elements in the group or the sum of the distinct values of the column in the group | ||
DOUBLE | avg(col), avg(DISTINCT col) | returns the average of the elements in the group or the average of the distinct values of the column in the group | ||
DOUBLE | min(col) | returns the minimum value of the column in the group | ||
DOUBLE | max(col) | returns the maximum value of the column in the group |
...