Page History

...

=

...

Hive

...

User

...

Defined

...

Functions

...

=

...

Table of Contents

In the CLI,

...

use

...

the

...

commands

...

below

...

to

...

show

...

the

...

latest

...

documentation:

{

Code Block
} SHOW FUNCTIONS; DESCRIBE FUNCTION <function_name>; {code} h2.

Built-in

...

Operators

...

Relational

...

Operators

...

The
...
following
...
operators
...
compare
...
the
...
passed
...
operands
...
and
...
generate
...
a
...
TRUE
...
or
...
FALSE
...
value
...
depending
...
on
...
whether
...
the
...
comparison
...
between
...
the
...
operands
...
holds.
...
Operator
Operand types
Description
A = B
All primitive types
TRUE if expression A is equal to expression B otherwise FALSE
A == B
None!
Fails because of invalid syntax. SQL uses =, not ==
A <> B
All primitive types
NULL if A or B is NULL, TRUE if expression A is NOT equal to expression B otherwise FALSE
A < B
All primitive types
NULL if A or B is NULL, TRUE if expression A is less than expression B otherwise FALSE
A <= B
All primitive types
NULL if A or B is NULL, TRUE if expression A is less than or equal to expression B otherwise FALSE
A > B
All primitive types
NULL if A or B is NULL, TRUE if expression A is greater than expression B otherwise FALSE
A >= B
All primitive types
NULL if A or B is NULL, TRUE if expression A is greater than or equal to expression B otherwise FALSE
A IS NULL
all types
TRUE if expression A evaluates to NULL otherwise FALSE
A IS NOT NULL
All types
FALSE if expression A evaluates to NULL otherwise TRUE
A LIKE B
strings
NULL if A or B is NULL, TRUE if string A matches the SQL simple regular expression B, otherwise FALSE. The comparison is done character by character. The _ character in B matches any character in A(similar to . in posix regular expressions) while the % character in B matches an arbitrary number of characters in A(similar to .* in posix regular expressions) e.g. 'foobar' like 'foo' evaluates to FALSE where as 'foobar' like 'foo_ _ _' evaluates to TRUE and so does 'foobar' like 'foo%'
A RLIKE B
strings
NULL if A or B is NULL, TRUE if string A matches the Java regular expression B(See Java regular expressions syntax), otherwise FALSE e.g. 'foobar' rlike 'foo' evaluates to FALSE where as 'foobar' rlike '^f.*r$' evaluates to TRUE
A REGEXP B
strings
Same as RLIKE

Arithmetic Operators

The following operators support various common arithmetic operations on the operands. All return number types; if any of the operands are NULL, then the result is also NULL.

Operator	Operand types	Description
A + B	All number types	Gives the result of adding A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. e.g. since every integer is a float, therefore float is a containing type of integer so the + operator on a float and an int will result in a float.
A - B	All number types	Gives the result of subtracting B from A. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A * B	All number types	Gives the result of multiplying A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. Note that if the multiplication causing overflow, you will have to cast one of the operators to a type higher in the type hierarchy.
A / B	All number types	Gives the result of dividing B from A. The result is a double type.
A % B	All number types	Gives the reminder resulting from dividing A by B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A & B	All number types	Gives the result of bitwise AND of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A \| B	All number types	Gives the result of bitwise OR of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A ^ B	All number types	Gives the result of bitwise XOR of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
~A	All number types	Gives the result of bitwise NOT of A. The type of the result is the same as the type of A.

Logical Operators

The following operators provide support for creating logical expressions. All of them return boolean TRUE, FALSE, or NULL depending upon the boolean values of the operands. NULL behaves as an "unknown" flag, so if the result depends on the state of an unknown, the result itself is unknown.

Operator	Operand types	Description
A AND B	boolean	TRUE if both A and B are TRUE, otherwise FALSE. NULL if A or B is NULL
A && B	boolean	Same as A AND B
A OR B	boolean	TRUE if either A or B or both are TRUE; FALSE OR NULL is NULL; otherwise FALSE
A \|\| B	boolean	Same as A OR B
NOT A	boolean	TRUE if A is FALSE or NULL if A is NULL. Otherwise FALSE.
! A	boolean	Same as NOT A

Complex Type Constructors

The following functions construct instances of complex types.

Constructor Function	Operands	Description
map	(key1, value1, key2, value2, ...)	Creates a map with the given key/value pairs
struct	(val1, val2, val3, ...)	Creates a struct with the given field values. Struct field names will be col1, col2, ...
array	(val1, val2, ...)	Creates an array with the given elements

Operators on Complex Types

The following operators provide mechanisms to access elements in Complex Types

Operator	Operand types	Description
An	A is an Array and n is an int	Returns the nth element in the array A. The first element has index 0 e.g. if A is an array comprising of 'foo', 'bar' then A0 returns 'foo' and A1 returns 'bar'
Mkey	M is a Map<K, V> and key has type K	Returns the value corresponding to the key in the map e.g. if M is a map comprising of {'f' -> 'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M'all' returns 'foobar'
S.x	S is a struct	Returns the x field of S. e.g for struct foobar {int foo, int bar} foobar.foo returns the integer stored in the foo field of the struct.

Built-in Functions

Mathematical Functions

The following built-in mathematical functions are supported in hive; most return NULL when the argument(s) are NULL:

Return Type	Name(Signature)	Description
BIGINT	round(double a)	Returns the rounded BIGINT value of the double
DOUBLE	round(double a, int d)	Returns the double rounded to d decimal places
BIGINT	floor(double a)	Returns the maximum BIGINT value that is equal or less than the double
BIGINT	ceil(double a), ceiling(double a)	Returns the minimum BIGINT value that is equal or greater than the double
double	rand(), rand(int seed)	Returns a random number (that changes from row to row) that is distributed uniformly from 0 to 1. Specifiying the seed will make sure the generated random number sequence is deterministic.
double	exp(double a)	Returns e^a where e is the base of the natural logarithm
double	ln(double a)	Returns the natural logarithm of the argument
double	log10(double a)	Returns the base-10 logarithm of the argument
double	log2(double a)	Returns the base-2 logarithm of the argument
double	log(double base, double a)	Return the base "base" logarithm of the argument
double	pow(double a, double p) power(double a, double p)	Return a^p
double	sqrt(double a)	Returns the square root of a
string	bin(BIGINT a)	Returns the number in binary format (see [http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_bin

...

])

...


string	hex(BIGINT

...

a)

...

hex(string

...

a)

...

If

...

the

...

argument

...

is

...

an

...

int,

...

hex

...

returns

...

the

...

number

...

as

...

a

...

string

...

in

...

hex

...

format.

...

Otherwise

...

if

...

the

...

number

...

is

...

a

...

string,

...

it

...

converts

...

each

...

character

...

into

...

its

...

hex

...

representation

...

and

...

returns

...

the

...

resulting

...

string.

...

(see

...

[

...

http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_hex]

...

)

...


string	unhex(string

...

a)

...

Inverse

...

of

...

hex.

...

Interprets

...

each

...

pair

...

of

...

characters

...

as

...

a

...

hexidecimal

...

number

...

and

...

converts

...

to

...

the

...

character

...

represented

...

by

...

the

...

number.

...


string	conv(BIGINT

...

num,

...

int

...

from_base,

...

int

...

to_base)

...

Converts

...

a

...

number

...

from

...

a

...

given

...

base

...

to

...

another

...

(see

...

[

...

http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_conv

...

])

...


double	abs(double

...

a)

...

Returns

...

the

...

absolute

...

value

...

int

...

double

...

pmod(int

...

a,

...

int

...

b)

...

pmod(double

...

a,

...

double

...

b)

...

Returns

...

the

...

positive

...

value

...

of

...

a

...

mod

...

b

...


double	sin(double

...

a)

...

Returns

...

the

...

sine

...

of

...

a

...

(a

...

is

...

in

...

radians)

...


double	asin(double

...

a)

...

Returns

...

the

...

arc

...

sin

...

of

...

x

...

if

...

-1<=a<=1

...

or

...

null

...

otherwise

...


double	cos(double

...

a)

...

Returns

...

the

...

cosine

...

of

...

a

...

(a

...

is

...

in

...

radians)

...


double	acos(double

...

a)

...

Returns

...

the

...

arc

...

cosine

...

of

...

x

...

if

...

-1<=a<=1

...

or

...

null

...

otherwise

...

int

...

double

...

positive(int

...

a)

...

positive(double

...

a)

...

Returns

...

a

...

int

...

double

...

negative(int

...

a)

...

negative(double

...

a)

...

Returns

...

-a

...

Collection Functions

The following built-in

...

collection

...

functions

...

are

...

supported

...

in

...

hive:

...

Return

...

Type

...

Name(Signature)

...

	Description
int	size(Map<K.V>)

...

Returns

...

the

...

number

...

of

...

elements

...

in

...

the

...

map

...

type

...


int	size(Array<T>)

...

Returns

...

the

...

number

...

of

...

elements

...

in

...

the

...

array

...

type

Type Conversion Functions

The following type conversion functions are supported in hive:

Return Type	Name(Signature)	Description
Expected "=" to follow "type"	cast(expr as <type>)	Converts the results of the expression expr to <type> e.g. cast('1'

...

as

...

BIGINT)

...

will

...

convert

...

the

...

string

...

'1'

...

to

...

it

...

integral

...

representation.

...

A

...

null

...

is

...

returned

...

if

...

the

...

conversion

...

does

...

not

...

succeed.

...

Date Functions

The following built-in

...

date

...

functions

...

are

...

supported

...

in

...

hive:

...

string

...

from_unixtime(bigint

...

unixtime

...

,

...

string

...

format

...

)

...

Converts

...

the

...

number

...

of

...

seconds

...

from

...

unix

...

epoch

...

(1970-01-01

...

00:00:00

...

UTC)

...

to

...

a

...

string

...

representing

...

the

...

timestamp

...

of

...

that

...

moment

...

in

...

the

...

current

...

system

...

time

...

zone

...

in

...

the

...

format

...

of

...

"1970-01-01

...

00:00:00"

...


bigint	unix_timestamp()

...

Gets

...

current

...

time

...

stamp

...

using

...

the

...

default

...

time

...

zone.

...


bigint	unix_timestamp(string

...

date)

...

Converts

...

time

...

string

...

in

...

format

...

yyyy-MM-dd

...

HH:mm:ss

...

to

...

Unix

...

time

...

stamp,

...

return

...

0

...

if

...

fail:

...

unix_timestamp('2009-03-20

...

11:30:01')

...

=

...

1237573801

...


bigint	unix_timestamp(string

...

date,

...

string

...

pattern)

...

Convert

...

time

...

string

...

with

...

given

...

pattern

...

(see

...

[http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html

...

])

...

to

...

Unix

...

time

...

stamp,

...

return

...

0

...

if

...

fail:

...

unix_timestamp('2009-03-20',

...

'yyyy-MM-dd')

...

=

...

1237532400

...


string	to_date(string

...

timestamp)

...

Returns

...

the

...

date

...

part

...

of

...

a

...

timestamp

...

string:

...

to_date("1970-01-01

...

00:00:00")

...

=

...

"1970-01-01"

...


int	year(string

...

date)

...

Returns

...

the

...

year

...

part

...

of

...

a

...

date

...

or

...

a

...

timestamp

...

string:

...

year("1970-01-01

...

00:00:00")

...

=

...

1970,

...

year("1970-01-01")

...

=

...

1970

...


int	month(string

...

date)

...

Returns

...

the

...

month

...

part

...

of

...

a

...

date

...

or

...

a

...

timestamp

...

string:

...

month("1970-11-01

...

00:00:00")

...

=

...

11,

...

month("1970-11-01")

...

=

...

11

...


int	day(string

...

date)

...

dayofmonth(date)

...

Return

...

the

...

day

...

part

...

of

...

a

...

date

...

or

...

a

...

timestamp

...

string:

...

day("1970-11-01

...

00:00:00")

...

=

...

1,

...

day("1970-11-01")

...

=

...

1

...


int	hour(string

...

date)

...

Returns

...

the

...

hour

...

of

...

the

...

timestamp:

...

hour('2009-07-30

...

12:58:59')

...

=

...

12,

...

hour('12:58:59')

...

=

...

12

...


int	minute(string

...

date)

...

Returns

...

the

...

minute

...

of

...

the

...

timestamp

...


int	second(string

...

date)

...

Returns

...

the

...

second

...

of

...

the

...

timestamp

...


int	weekofyear(string

...

date)

...

Return

...

the

...

week

...

number

...

of

...

a

...

timestamp

...

string:

...

weekofyear("1970-11-01

...

00:00:00")

...

=

...

44,

...

weekofyear("1970-11-01")

...

=

...

44

...


int	datediff(string

...

enddate,

...

string

...

startdate)

...

Return

...

the

...

number

...

of

...

days

...

from

...

startdate

...

to

...

enddate:

...

datediff('2009-03-01',

...

'2009-02-27')

...

=

...

2

...


int	date_add(string

...

startdate,

...

int

...

days)

...

Add

...

a

...

number

...

of

...

days

...

to

...

startdate:

...

date_add('2008-12-31',

...

1)

...

=

...

'2009-01-01'

...


int	date_sub(string

...

startdate,

...

int

...

days)

...

Subtract

...

a

...

number

...

of

...

days

...

to

...

startdate:

...

date_sub('2008-12-31',

...

1)

...

=

...

'2008-12-30'

...

Conditional Functions

T	if(boolean

...

testCondition,

...

T

...

valueTrue,

...

T

...

valueFalseOrNull)

...

Return

...

valueTrue

...

when

...

testCondition

...

is

...

true,

...

returns

...

valueFalseOrNull

...

otherwise

...


T	COALESCE(T

...

v1,

...

T

...

v2,

...

...)

...

Return

...

the

...

first

...

v

...

that

...

is

...

not

...

NULL,

...

or

...

NULL

...

if

...

all

...

v's

...

are

...

NULL

...


T	CASE a WHEN b THEN c WHEN d THEN e* ELSE f END	When a = b, returns c; when a = d, return e; else return f
T	CASE WHEN a THEN b WHEN c THEN d* ELSE e END	When a = true, returns b; when c = true, return d; else return e

String Functions

The following are built-in String functions are supported in hive:

Return Type	Name(Signature)	Description
int	length(string A)	Returns the length of the string
string	reverse(string A)	Returns the reversed string
string	concat(string A, string B...)	Returns the string resulting from concatenating the strings passed in as parameters in order. e.g. concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings.
string	concat_ws(string SEP, string A, string B...)	Like concat() above, but with custom separator SEP.
string	substr(string A, int start) substring(string A, int start)	Returns the substring of A starting from start position till the end of string A e.g. substr('foobar', 4) results in 'bar' (see [http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr

...

])

...


string	substr(string

...

A,

...

int

...

start,

...

int

...

len)

...

substring(string

...

A,

...

int

...

start,

...

int

...

len)

...

Returns

...

the

...

substring

...

of

...

A

...

starting

...

from

...

start

...

position

...

with

...

length

...

len

...

e.g.

...

substr('foobar',

...

4,

...

1)

...

results

...

in

...

'b'

...

(see

...

[

...

http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr

...

])

...


string	upper(string

...

A)

...

ucase(string

...

A)

...

Returns

...

the

...

string

...

resulting

...

from

...

converting

...

all

...

characters

...

of

...

A

...

to

...

upper

...

case

...

e.g.

...

upper('fOoBaR')

...

results

...

in

...

'FOOBAR'

...


string	lower(string

...

A)

...

lcase(string

...

A)

...

Returns

...

the

...

string

...

resulting

...

from

...

converting

...

all

...

characters

...

of

...

B

...

to

...

lower

...

case

...

e.g.

...

lower('fOoBaR')

...

results

...

in

...

'foobar'

...


string	trim(string

...

A)

...

Returns

...

the

...

string

...

resulting

...

from

...

trimming

...

spaces

...

from

...

both

...

ends

...

of

...

A

...

e.g.

...

trim('

...

foobar

...

')

...

results

...

in

...

'foobar'

...


string	ltrim(string

...

A)

...

Returns

...

the

...

string

...

resulting

...

from

...

trimming

...

spaces

...

from

...

the

...

beginning(left

...

hand

...

side)

...

of

...

A

...

e.g.

...

ltrim('

...

foobar

...

')

...

results

...

in

...

'foobar

...

'

...


string	rtrim(string

...

A)

...

Returns

...

the

...

string

...

resulting

...

from

...

trimming

...

spaces

...

from

...

the

...

end(right

...

hand

...

side)

...

of

...

A

...

e.g.

...

rtrim('

...

foobar

...

')

...

results

...

in

...

'

...

foobar'

...


string	regexp_replace(string

...

A,

...

string

...

B,

...

string

...

C)

...

Returns

...

the

...

string

...

resulting

...

from

...

replacing

...

all

...

substrings

...

in

...

B

...

that

...

match

...

the

...

Java

...

regular

...

expression

...

syntax(See

...

Java

...

regular

...

expressions

...

syntax)

...

with

...

C

...

e.g.

...

regexp_replace("foobar",

...

"oo

...

|ar",

...

"")

...

returns

...

'fb.'

...

Note

...

that

...

some

...

care

...

is

...

necessary

...

in

...

using

...

predefined

...

character

...

classes:

...

using

...

'\s'

...

as

...

the

...

second

...

argument

...

will

...

match

...

the

...

letter

...

s;

...

'

...

s'

...

is

...

necessary

...

to

...

match

...

whitespace,

...

etc.

...


string	regexp_extract(string

...

subject,

...

string

...

pattern,

...

int

...

index)

...

Returns

...

the

...

string

...

extracted

...

using

...

the

...

pattern.

...

e.g.

...

regexp_extract('foothebar',

...

'foo(.*?)(bar)',

...

2)

...

returns

...

'bar.'

...

Note

...

that

...

some

...

care

...

is

...

necessary

...

in

...

using

...

predefined

...

character

...

classes:

...

using

...

'\s'

...

as

...

the

...

second

...

argument

...

will

...

match

...

the

...

letter

...

s;

...

'

...

s'

...

is

...

necessary

...

to

...

match

...

whitespace,

...

etc.

...

The

...

'index'

...

parameter

...

is

...

the

...

Java

...

regex

...

Matcher

...

group()

...

method

...

index.

...

See

...

docs/api/java/util/regex/Matcher.html

...

for

...

more

...

information

...

on

...

the

...

'index'

...

or

...

Java

...

regex

...

group()

...

method.

...


string	parse_url(string

...

urlString,

...

string

...

partToExtract

...

,

...

string

...

keyToExtract

...

)

...

Returns

...

the

...

specified

...

part

...

from

...

the

...

URL.

...

Valid

...

values

...

for

...

partToExtract

...

include

...

HOST,

...

PATH,

...

QUERY,

...

REF,

...

PROTOCOL,

...

AUTHORITY,

...

FILE,

...

and

...

USERINFO.

...

e.g.

...

parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1',

...

'HOST')

...

returns

...

'facebook.com'.

...

Also

...

a

...

value

...

of

...

a

...

particular

...

key

...

in

...

QUERY

...

can

...

be

...

extracted

...

by

...

providing

...

the

...

key

...

as

...

the

...

third

...

argument,

...

e.g.

...

parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1',

...

'QUERY',

...

'k1')

...

returns

...

'v1'.

...


string	get_json_object(string

...

json_string,

...

string

...

path)

...

Extract

...

json

...

object

...

from

...

a

...

json

...

string

...

based

...

on

...

json

...

path

...

specified,

...

and

...

return

...

json

...

string

...

of

...

the

...

extracted

...

json

...

object.

...

It

...

will

...

return

...

null

...

if

...

the

...

input

...

json

...

string

...

is

...

invalid.

...

NOTE:

...

The

...

json

...

path

...

can

...

only

...

have

...

the

...

characters

...

0-9a-z_

...

,

...

i.e.,

...

no

...

upper-case

...

or

...

special

...

characters.

...

Also,

...

the

...

keys

...

*cannot

...

start

...

with

...

numbers.*

...

This

...

is

...

due

...

to

...

restrictions

...

on

...

Hive

...

column

...

names.

...


string	space(int

...

n)

...

Return

...

a

...

string

...

of

...

n

...

spaces

...


string	repeat(string

...

str,

...

int

...

n)

...

Repeat

...

str

...

n

...

times

...


int	ascii(string

...

str)

...

Returns

...

the

...

numeric

...

value

...

of

...

the

...

first

...

character

...

of

...

str

...


string	lpad(string

...

str,

...

int

...

len,

...

string

...

pad)

...

Returns

...

str,

...

left-padded

...

with

...

pad

...

to

...

a

...

length

...

of

...

len

...


string	rpad(string

...

str,

...

int

...

len,

...

string

...

pad)

...

Returns

...

str,

...

right-padded

...

with

...

pad

...

to

...

a

...

length

...

of

...

len

...


array	split(string

...

str,

...

string

...

pat)

...

Split

...

str

...

around

...

pat

...

(pat

...

is

...

a

...

regular

...

expression)

...


int	find_in_set(string

...

str,

...

string

...

strList)

...

Returns

...

the

...

first

...

occurance

...

of

...

str

...

in

...

strList

...

where

...

strList

...

is

...

a

...

comma-delimited

...

string.

...

Returns

...

null

...

if

...

either

...

argument

...

is

...

null.

...

Returns

...

0

...

if

...

the

...

first

...

argument

...

contains

...

any

...

commas.

...

e.g.

...

find_in_set('ab',

...

'abc,b,ab,c,def')

...

returns

...

3

...


array<array<string>>	sentences(string

...

str,

...

string

...

lang,

...

string

...

locale)

...

Tokenizes

...

a

...

string

...

of

...

natural

...

language

...

text

...

into

...

words

...

and

...

sentences,

...

where

...

each

...

sentence

...

is

...

broken

...

at

...

the

...

appropriate

...

sentence

...

boundary

...

and

...

returned

...

as

...

an

...

array

...

of

...

words.

...

The

...

'lang'

...

and

...

'locale'

...

are

...

optional

...

arguments.

...

e.g.

...

sentences('Hello

...

there!

...

How

...

are

...

you?')

...

returns

...

(

...

("Hello",

...

"there"),

...

("How",

...

"are",

...

"you")

...

)

...

array<struct<string,double>>

...

ngrams(array<array<string>>,

...

int

...

N,

...

int

...

K,

...

int

...

pf)

...

Returns

...

the

...

top-k

...

N-grams

...

from

...

a

...

set

...

of

...

tokenized

...

sentences,

...

such

...

as

...

those

...

returned

...

by

...

the

...

sentences()

...

UDAF.

...

See

...

Hive-StatisticsAndDataMining

...

for

...

more

...

information.

...

array<struct<string,double>>

...

context_ngrams(array<array<string>>,

...

array<string>,

...

int

...

K,

...

int

...

pf)

...

Returns

...

the

...

top-k

...

contextual

...

N-grams

...

from

...

a

...

set

...

of

...

tokenized

...

sentences,

...

given

...

a

...

string

...

of

...

"context".

...

See

...

Hive-StatisticsAndDataMining

...

for

...

more

...

information.

...

Misc.

...

Functions

...

xpath

The following functions are described in Hive-LanguageManual-XPathUDF

...

:

...

xpath,

...

xpath_short,

...

xpath_int,

...

xpath_long,

...

xpath_float,

...

xpath_double,

...

xpath_number,

...

xpath_string

...

get_json_object

...

A

...

limited

...

version

...

of

...

JSONPath

...

is

...

supported:

$ : Root object
. : Child operator
[] : Subscript operator for array
* : Wildcard for []

Syntax not supported that's worth noticing:

: Zero length string as key
.. : Recursive descent
@ : Current object/element
() : Script expression
?() : Filter (script) expression.
, : Union operator
end.step : array slice operator

Example: src_json table is a single column (json), single row table:

Code Block



 * $   : Root object
 * .   : Child operator
 * []  : Subscript operator for array
 * *   : Wildcard for []

Syntax not supported that's worth noticing:

 * : Zero length string as key
 * ..  : Recursive descent
 * @   : Current object/element
 * ()  : Script expression
 * ?() : Filter (script) expression.
 * [,] : Union operator
 * [start:end.step] : array slice operator

Example: src_json table is a single column (json), single row table:

{code}
+----+
                               json
+----+
{"store":
  {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
   "bicycle":{"price":19.95,"color":"red"}
  },
 "email":"amy@only_for_json_udf_test.net",
 "owner":"amy"
}
+----+
{code}

The

...

fields

...

of

...

the

...

json

...

object

...

can

...

be

...

extracted

...

using

...

these

...

queries:

{

Code Block

}
hive> SELECT get_json_object(src_json.json, '$.owner') FROM src_json;
amy

hive> SELECT get_json_object(src_json.json, '$.store.fruit[0]') FROM src_json;
{"weight":8,"type":"apple"}

hive> SELECT get_json_object(src_json.json, '$.non_exist_key') FROM src_json;
NULL
{code}
h2.

Built-in

...

Aggregate

...

Functions

...

(UDAF)

...

The
...
following
...
are
...
built-in
...
aggregate
...
functions
...
are
...
supported
...
in
...
Hive:
...
Return
...
Type*
...
*
...
Name(Signature)*
...
*
...
Description*
...
*
...
bigint
count(),
...
count(expr),
...
count(DISTINCT
...
expr
...
,
...
expr_.
...
)
...
count()
...
-
...
Returns
...
the
...
total
...
number
...
of
...
retrieved
...
rows,
...
including
...
rows
...
containing
...
NULL
...
values;
...
count(expr)
...
-
...
Returns
...
the
...
number
...
of
...
rows
...
for
...
which
...
the
...
supplied
...
expression
...
is
...
non-NULL;
...
count(DISTINCT
...
expr
...
,
...
expr
...
)
...
-
...
Returns
...
the
...
number
...
of
...
rows
...
for
...
which
...
the
...
supplied
...
expression(s)
...
are
...
unique
...
and
...
non-NULL.
...
double
sum(col),
...
sum(DISTINCT
...
col)
...
Returns
...
the
...
sum
...
of
...
the
...
elements
...
in
...
the
...
group
...
or
...
the
...
sum
...
of
...
the
...
distinct
...
values
...
of
...
the
...
column
...
in
...
the
...
group
...
double
avg(col),
...
avg(DISTINCT
...
col)
...
Returns
...
the
...
average
...
of
...
the
...
elements
...
in
...
the
...
group
...
or
...
the
...
average
...
of
...
the
...
distinct
...
values
...
of
...
the
...
column
...
in
...
the
...
group
...
double
min(col)
...
Returns
...
the
...
minimum
...
of
...
the
...
column
...
in
...
the
...
group
...
double
max(col)
...
Returns
...
the
...
maximum
...
value
...
of
...
the
...
column
...
in
...
the
...
group
...
double
var_pop(col)
...
Returns
...
the
...
variance
...
of
...
a
...
numeric
...
column
...
in
...
the
...
group
...
double
var_samp(col)
...
Returns
...
the
...
unbiased
...
sample
...
variance
...
of
...
a
...
numeric
...
column
...
in
...
the
...
group
...
double
stddev_pop(col)
...
Returns
...
the
...
standard
...
deviation
...
of
...
a
...
numeric
...
column
...
in
...
the
...
group
...
double
stddev_samp(col)
...
Returns
...
the
...
unbiased
...
sample
...
standard
...
deviation
...
of
...
a
...
numeric
...
column
...
in
...
the
...
group
...
double
covar_pop(col1,
...
col2)
...
Returns
...
the
...
population
...
covariance
...
of
...
a
...
pair
...
of
...
numeric
...
columns
...
in
...
the
...
group
...
double
covar_samp(col1,
...
col2)
...
Returns
...
the
...
sample
...
covariance
...
of
...
a
...
pair
...
of
...
a
...
numeric
...
columns
...
in
...
the
...
group
...
double
corr(col1,
...
col2)
...
Returns
...
the
...
Pearson
...
coefficient
...
of
...
correlation
...
of
...
a
...
pair
...
of
...
a
...
numeric
...
columns
...
in
...
the
...
group
...
double
percentile(col,
...
p)
...
Returns
...
the
...
exact
...
p^th^
...
percentile
...
of
...
an
...
integer
...
column
...
in
...
the
...
group
...
(does
...
not
...
work
...
with
...
floating
...
point
...
types).
...
p
...
must
...
be
...
between
...
0
...
and
...
1.
...
array<double>
percentile(col,
...
array(p~1,,
...
,
...
p,,2,,
...
...))
...
Returns
...
the
...
exact
...
percentiles
...
p,,1,,,
...
p,,2,,,
...
...
...
of
...
an
...
integer
...
column
...
in
...
the
...
group
...
(does
...
not
...
work
...
with
...
floating
...
point
...
types).
...
p,,i~
...
must
...
be
...
between
...
0
...
and
...
1.
...
double
percentile_approx(col,
...
p
...
,
...
B
...
)
...
Returns
...
an
...
approximate
...
p^th^
...
percentile
...
of
...
a
...
numeric
...
column
...
(including
...
floating
...
point
...
types)
...
in
...
the
...
group.
...
The
...
B
...
parameter
...
controls
...
approximation
...
accuracy
...
at
...
the
...
cost
...
of
...
memory.
...
Higher
...
values
...
yield
...
better
...
approximations,
...
and
...
the
...
default
...
is
...
10,000.
...
When
...
the
...
number
...
of
...
distinct
...
values
...
in
...
col
...
is
...
smaller
...
than
...
B,
...
this
...
gives
...
an
...
exact
...
percentile
...
value.
...
array<double>
percentile_approx(col,
...
array(p~1,,
...
,
...
p,,2_
...
...)
...
,
...
B
...
)
...
Same
...
as
...
above,
...
but
...
accepts
...
and
...
returns
...
an
...
array
...
of
...
percentile
...
values
...
instead
...
of
...
a
...
single
...
one.
...
array<struct {`'x','y'`
...
}
...
>
...
histogram_numeric(col,
...
b)
...
Computes
...
a
...
histogram
...
of
...
a
...
numeric
...
column
...
in
...
the
...
group
...
using
...
b
...
non-uniformly
...
spaced
...
bins.
...
The
...
output
...
is
...
an
...
array
...
of
...
size
...
b
...
of
...
double-valued
...
(x,y)
...
coordinates
...
that
...
represent
...
the
...
bin
...
centers
...
and
...
heights
...
array
collect_set(col)
...
Returns
...
a
...
set
...
of
...
objects
...
with
...
duplicate
...
elements
...
eliminated

Built-in

...

Table-Generating

...

Functions

...

(UDTF)

...

<<Anchor(UDTF)>>
...
Normal
...
user-defined
...
functions,
...
such
...
as
...
concat(),
...
take
...
in
...
a
...
single
...
input
...
row
...
and
...
output
...
a
...
single
...
output
...
row.
...
In
...
contrast,
...
table-generating
...
functions
...
transform
...
a
...
single
...
input
...
row
...
to
...
multiple
...
output
...
rows.
...

explode

explode()

...

takes

...

in

...

an

...

array

...

as

...

an

...

input

...

and

...

outputs

...

the

...

elements

...

of

...

the

...

array

...

as

...

separate

...

rows.

...

UDTF's

...

can

...

be

...

used

...

in

...

the

...

SELECT

...

expression

...

list

...

and

...

as

...

a

...

part

...

of

...

LATERAL

...

VIEW.

...

An

...

example

...

use

...

of

...

explode()

...

in

...

the

...

SELECT

...

expression

...

list

...

is

...

as

...

follows:

...

Consider

...

a

...

table

...

named

...

myTable

...

that

...

has

...

a

...

single

...

column

...

(myCol)

...

and

...

two

...

rows:

...

Array<int>

...

myCol

...

1,2,3

...

4,5,6

...

Then running the query:

SELECT explode(myCol)

...

AS

...

myNewCol

...

FROM

...

myTable;

...

Will

...

produce:

...

int

...

myNewCol

...

1

2

3

4

5

6

Using the syntax "SELECT udtf(col)

...

AS

...

colAlias..."

...

has

...

a

...

few

...

limitations:

...

No

...

other

...

expressions

...

are

...

allowed

...

in

...

SELECT

...

- SELECT

...

- pageid,

...

- explode(adid_list)

...

- AS

...

- myCol...

...

- is

...

- not

...

- supported

...

UDTF's

...

can't

...

be

...

nested

...

- SELECT

...

- explode(explode(adid_list))

...

- AS

...

- myCol...

...

- is

...

- not

...

- supported

...

GROUP

...

BY

...

/

...

CLUSTER

...

BY

...

/

...

DISTRIBUTE

...

BY

...

/

...

SORT

...

BY

...

is

...

not supported
- SELECT explode(adid_list)

...

- AS

...

- myCol

...

- ...

...

- GROUP

...

- BY

...

- myCol

...

- is

...

- not

...

- supported

...

Please

...

see

...

Hive-LanguageManual-LateralView

...

for

...

an

...

alternative

...

syntax

...

that

...

does

...

not

...

have

...

these

...

limitations.

...

The

...

following

...

are

...

built-in

...

table-generating

...

functions

...

are

...

supported

...

in

...

Hive:

...

Return

...

Type*

...

*

...

Name(Signature)*

...

*

...

Description*

...

*

...


myType	explode(array<myType>

...

a)

...

<<Anchor(explode)>>

...

For

...

each

...

element

...

in

...

a,

...

explode()

...

generates

...

a

...

row

...

containing

...

that

...

element

json_tuple

...

A

...

new

...

json_tuple()

...

UDTF

...

is

...

introduced

...

in

...

hive

...

0.7.

...

It

...

takes

...

a

...

set

...

of

...

names

...

(keys)

...

and

...

return

...

a

...

tuple

...

of

...

values

...

in

...

one

...

function.

...

If

...

you

...

are

...

using

...

get_json_object()

...

and

...

want

...

to

...

replace

...

it

...

with

...

json_tuple,

...

the

...

only

...

changes

...

is

...

that

...

your

...

query

...

will

...

be

...

using

...

json_tuple()

...

in

...

lateral

...

view

...

rather

...

than

...

multiple

...

get_json_object()

...

in

...

the

...

select

...

clause.

...

For

...

example,

{

Code Block
} select a.timestamp, get_json_object(a.appevents, '$.eventid'), get_json_object(a.appenvets, '$.eventname') from log a; {code}

should

...

be

...

changed

...

to

{

Code Block
} select a.timestamp, b.* from log a lateral view json_tuple(a.appevent, 'eventid', 'eventname') b as f1, f2; {code} h2. GROUPing and SORTing on

GROUPing and SORTing on f(column)

...

A

...

typical

...

OLAP

...

pattern

...

is

...

that

...

you

...

have

...

a

...

timestamp

...

column

...

and

...

you

...

want

...

to

...

group

...

by

...

daily

...

or

...

other

...

less

...

granular

...

date

...

windows

...

than

...

by

...

second.

...

So

...

you

...

might

...

want

...

to

...

select

...

concat(year(dt),month(dt))

...

and

...

then

...

group

...

on

...

that

...

concat().

...

But

...

if

...

you

...

attempt

...

to

...

GROUP

...

BY

...

or

...

SORT

...

BY

...

a

...

column

...

on

...

which

...

you've

...

applied

...

a

...

function

...

and

...

alias,

...

like

...

this:

{

Code Block
} select f(col) as fc, count$*$ from table_name group by fc {code}

You

...

will

...

get

...

an

...

error:

{

Code Block
} FAILED: Error in semantic analysis: line 1:69 Invalid Table Alias or Column Reference fc {code}

Because

...

you

...

are

...

not

...

able

...

to

...

GROUP

...

BY

...

or

...

SORT

...

BY

...

a

...

column

...

alias

...

on

...

which

...

a

...

function

...

has

...

been

...

applied.

...

There

...

are

...

two

...

workarounds.

...

First,

...

you

...

can

...

reformulate

...

this

...

query

...

with

...

subqueries,

...

which

...

is

...

somewhat

...

complicated:

{

Code Block
} select sq.fc,col1,col2,...,colN,count$*$ from (select f(col) as fc,col1,col2,...,colN from table_name) sq group by sq.fc,col1,col2,...,colN {code}

Or

...

you

...

can

...

make

...

sure

...

not

...

to

...

use

...

a

...

column

...

alias,

...

which

...

is

...

simpler:

{

Code Block
} select f(col) as fc, count$*$ from table_name group by f(col) {code}

Contact

...

Tim

...

Ellis

...

(tellis)

...

at

...

RiotGames

...

dot

...

com

...

if

...

you

...

would

...

like

...

to

...

discuss

...

this

...

in

...

further

...

detail.

...

Space shortcuts

Child pages

Versions Compared

Old Version 4

New Version 5

Key

Built-in

Operators

Relational

Operators

Arithmetic Operators

Logical Operators

Complex Type Constructors

Operators on Complex Types

Built-in Functions

Mathematical Functions

Collection Functions

Type Conversion Functions

Date Functions

Conditional Functions

String Functions

Misc.

Functions

xpath

get_json_object

A

Built-in

Aggregate

Functions

(UDAF)

Built-in

Table-Generating

Functions

(UDTF)

explode

json_tuple

GROUPing and SORTing on f(column)