Page History

...

Hive is designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data. It provides a simple query language called Hive QL, which is based on SQL and which enables users familiar with SQL to do ad-hoc querying, summarization and data analysis easily. At the same time, Hive QL also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.

What Hive is NOT

...

Hadoop is a batch processing system and Hadoop jobs tend to have high latency and incur substantial overheads in job submission and scheduling. As a result - latency for Hive queries is generally very high (minutes) even when data sets involved are very small (say a few hundred megabytes). As a result it cannot be compared with systems such as Oracle where analyses are conducted on a significantly smaller amount of data but the analyses proceed much more iteratively with the response times between iterations being less than a few minutes. Hive aims to provide acceptable (but not optimal) latency for interactive data browsing, queries over small data sets or test queries.
...
Operator
Operand types
Description
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="47f2f3b72170ab2c-49a5bdb6-4df64290-a08eb7cf-bdbe47da4caea931b6374c91"><ac:plain-text-body><![CDATA[
A[n]
A is an Array and n is an int
returns the nth element in the array A. The first element has index 0 e.g. if A is an array comprising of ['foo', 'bar'] then A[0] returns 'foo' and A[1] returns 'bar'
]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="77127ebd769185ca-db84a922-40844511-8f61a0ff-995f43aeab4555c41041505f"><ac:plain-text-body><![CDATA[
M[key]
M is a Map<K, V> and key has type K
returns the value corresponding to the key in the map e.g. if M is a map comprising of {'f' -> 'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar'
]]></ac:plain-text-body></ac:structured-macro>
S.x
S is a struct
returns the x field of S e.g for struct foobar {int foo, int bar} foobar.foo returns the integer stored in the foo field of the struct.
...
Return Type
Aggregation Function Name (Signature)
Description
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="48e263fc7c55c0cd-7ad7e7a2-4f994bf3-b21e80d7-8fca2fe04e9e77de5df465e0"><ac:plain-text-body><![CDATA[
BIGINT
count(), count(expr), count(DISTINCT expr[, expr_.])
count() - Returns the total number of retrieved rows, including rows containing NULL values; count(expr) - Returns the number of rows for which the supplied expression is non-NULL; count(DISTINCT expr[, expr]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL.
]]></ac:plain-text-body></ac:structured-macro>
DOUBLE
sum(col), sum(DISTINCT col)
returns the sum of the elements in the group or the sum of the distinct values of the column in the group
DOUBLE
avg(col), avg(DISTINCT col)
returns the average of the elements in the group or the average of the distinct values of the column in the group
DOUBLE
min(col)
returns the minimum value of the column in the group
DOUBLE
max(col)
returns the maximum value of the column in the group
...

Space shortcuts

Child pages

Versions Compared

Old Version 15

New Version 16

Key

What Hive is NOT

Operator	Operand types	Description
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="47f2f3b72170ab2c-49a5bdb6-4df64290-a08eb7cf-bdbe47da4caea931b6374c91"><ac:plain-text-body><![CDATA[	A[n]	A is an Array and n is an int	returns the nth element in the array A. The first element has index 0 e.g. if A is an array comprising of ['foo', 'bar'] then A[0] returns 'foo' and A[1] returns 'bar'	]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="77127ebd769185ca-db84a922-40844511-8f61a0ff-995f43aeab4555c41041505f"><ac:plain-text-body><![CDATA[	M[key]	M is a Map<K, V> and key has type K	returns the value corresponding to the key in the map e.g. if M is a map comprising of {'f' -> 'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar'	]]></ac:plain-text-body></ac:structured-macro>
S.x	S is a struct	returns the x field of S e.g for struct foobar {int foo, int bar} foobar.foo returns the integer stored in the foo field of the struct.

Return Type	Aggregation Function Name (Signature)	Description
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="48e263fc7c55c0cd-7ad7e7a2-4f994bf3-b21e80d7-8fca2fe04e9e77de5df465e0"><ac:plain-text-body><![CDATA[	BIGINT	count(*), count(expr), count(DISTINCT expr[, expr_.])	count(*) - Returns the total number of retrieved rows, including rows containing NULL values; count(expr) - Returns the number of rows for which the supplied expression is non-NULL; count(DISTINCT expr[, expr]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL.	]]></ac:plain-text-body></ac:structured-macro>
DOUBLE	sum(col), sum(DISTINCT col)	returns the sum of the elements in the group or the sum of the distinct values of the column in the group
DOUBLE	avg(col), avg(DISTINCT col)	returns the average of the elements in the group or the average of the distinct values of the column in the group
DOUBLE	min(col)	returns the minimum value of the column in the group
DOUBLE	max(col)	returns the maximum value of the column in the group