Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Types are associated with the columns in the tables. The following Primitive types are supported:
  • Integers
    • TINYINT1 byte integer
    • SMALLINT2 byte integer
    • INT4 byte integer
    • BIGINT8 byte integer
  • Boolean type
    • BOOLEANTRUE/FALSE
  • Floating point numbers
    • FLOATsingle precision
    • DOUBLEDouble precision
  • Fixed point numbers
    • DECIMALa fixed point value of user defined scale and precision
  • String types
    • STRINGsequence of characters in a specified character set
    • VARCHARsequence of characters in a specified character set with a maximum length
    • CHARsequence of characters in a specified character set with a defined length
  • Date and time types
    • TIMESTAMP A date and time without a timezone ("LocalDateTime" semantics)
    • TIMESTAMP WITH LOCAL TIME ZONE  A a specific point in time , up to nanosecond precisionmeasured down to nanoseconds ("Instant" semantics)
    • DATEa date
  • Binary types
    • BINARYa sequence of bytes

...

  • genderwhich is a STRING.
  • activewhich is a BOOLEAN.

Timestamp Types

Timestamps have been the source of much confusion, so we try to document the intended semantics of Hive.

Timestamp with local time zone ("Instant" semantics)

Java's "Instant" timestamps define a set point in time that remains constant regardless of where the data is read.

This timestamp defines a particular point in time, which is mapped into the local timezone. Thus, "2014-12-12 12:34:56" when written in EST, will become "2014-12-12 09:34:56" when read in PST.

Timestamp ("LocalDateTime" semantics)

Java's "LocalDateTime" timestamps record a date and time as year, month, date, hour, minute, and seconds without a timezone. These timestamps always have those same values regardless of the local time zone.

For example, the timestamp value of "2014-12-12 12:34:56" is decomposed into year, month, day, hour, minute and seconds fields, but with no time zone information available, it does not correspond to any specific instant. It will always be the same value regardless of the local time zone. Unless your application uses UTC consistently, timestamp with local time zone is strongly preferred over timestamp for most applications. When users say an event is at 10:00, it is always in reference to a certain timezone and means a point in time, rather than 10:00 in an arbitrary time zone.


TypeValue in America/Los_AngelesValue in America/New_York
timestamp2014-12-12 12:34:56

2014-12-12 12:34:56

timestamp with local time zone2014-12-12 12:34:562014-12-12 15:34:56


Built In Operators and Functions

...

  • Relational OperatorsThe following operators compare the passed operands and generate a TRUE or FALSE value, depending on whether the comparison between the operands holds or not.

Relational Operator

Operand types

Description

A = B

all primitive types

TRUE if expression A is equivalent to expression B; otherwise FALSE

A != B

all primitive types

TRUE if expression A is not equivalent to expression B; otherwise FALSE

A < B

all primitive types

TRUE if expression A is less than expression B; otherwise FALSE

A <= B

all primitive types

TRUE if expression A is less than or equal to expression B; otherwise FALSE

A > B

all primitive types

TRUE if expression A is greater than expression B] otherwise FALSE

A >= B

all primitive types

TRUE if expression A is greater than or equal to expression B otherwise FALSE

A IS NULL

all types

TRUE if expression A evaluates to NULL otherwise FALSE

A IS NOT NULL

all types

FALSE if expression A evaluates to NULL otherwise TRUE

A LIKE B

strings

TRUE if string A matches the SQL simple regular expression B, otherwise FALSE. The comparison is done character by character. The _ character in B matches any character in A (similar to . in posix regular expressions), and the % character in B matches an arbitrary number of characters in A (similar to .* in posix regular expressions). For example, 'foobar' LIKE 'foo' evaluates to FALSE where as 'foobar' LIKE 'foo___' evaluates to TRUE and so does 'foobar' LIKE 'foo%'. To escape % use \ (% matches one % character). If the data contains a semicolon, and you want to search for it, it needs to be escaped, columnValue LIKE 'a\;b'

A RLIKE B

strings

NULL if A or B is NULL, TRUE if any (possibly empty) substring of A matches the Java regular expression B (see Java regular expressions syntax), otherwise FALSE. For example, 'foobar' rlike 'foo' evaluates to TRUE and so does 'foobar' rlike '^f.*r$'.

A REGEXP B

strings

Same as RLIKE

  • Arithmetic OperatorsThe following operators support various common arithmetic operations on the operands. All of them return number types.

Arithmetic Operators

Operand types

Description

A + B

all number types

Gives the result of adding A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands, for example, since every integer is a float. Therefore, float is a containing type of integer so the + operator on a float and an int will result in a float.

A - B

all number types

Gives the result of subtracting B from A. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.

A * B

all number types

Gives the result of multiplying A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. Note that if the multiplication causing overflow, you will have to cast one of the operators to a type higher in the type hierarchy.

A / B

all number types

Gives the result of dividing B from A. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. If the operands are integer types, then the result is the quotient of the division.

A % B

all number types

Gives the reminder resulting from dividing A by B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.

A & B

all number types

Gives the result of bitwise AND of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.

A | B

all number types

Gives the result of bitwise OR of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.

A ^ B

all number types

Gives the result of bitwise XOR of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.

~A

all number types

Gives the result of bitwise NOT of A. The type of the result is the same as the type of A.

  • Logical Operators The following operators provide support for creating logical expressions. All of them return boolean TRUE or FALSE depending upon the boolean values of the operands.

Logical Operators

Operands types

Description

A AND B

boolean

TRUE if both A and B are TRUE, otherwise FALSE

A && B

boolean

Same as A AND B

A OR B

boolean

TRUE if either A or B or both are TRUE, otherwise FALSE

A || B

boolean

Same as A OR B

NOT A

boolean

TRUE if A is FALSE, otherwise FALSE

!A

boolean

Same as NOT A

  • Operators on Complex TypesThe following operators provide mechanisms to access elements in Complex Types

Operator

Operand types

Description

A[n]

A is an Array and n is an int

returns the nth element in the array A. The first element has index 0, for example, if A is an array comprising of ['foo', 'bar'] then A[0] returns 'foo' and A[1] returns 'bar'

M[key]

M is a Map<K, V> and key has type K

returns the value corresponding to the key in the map for example, if M is a map comprising of
{'f' -> 'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar'

S.x

S is a struct

returns the x field of S, for example, for struct foobar {int foo, int bar} foobar.foo returns the integer stored in the foo field of the struct.

Built In Functions

Return Type

Function Name (Signature)

Description

BIGINT

round(double a)

returns the rounded BIGINT value of the double

BIGINT

floor(double a)

returns the maximum BIGINT value that is equal or less than the double

BIGINT

ceil(double a)

returns the minimum BIGINT value that is equal or greater than the double

double

rand(), rand(int seed)

returns a random number (that changes from row to row). Specifiying the seed will make sure the generated random number sequence is deterministic.

string

concat(string A, string B,...)

returns the string resulting from concatenating B after A. For example, concat('foo', 'bar') results in 'foobar'. This function accepts arbitrary number of arguments and return the concatenation of all of them.

string

substr(string A, int start)

returns the substring of A starting from start position till the end of string A. For example, substr('foobar', 4) results in 'bar'

string

substr(string A, int start, int length)

returns the substring of A starting from start position with the given length, for example,
substr('foobar', 4, 2) results in 'ba'

string

upper(string A)

returns the string resulting from converting all characters of A to upper case, for example, upper('fOoBaR') results in 'FOOBAR'

string

ucase(string A)

Same as upper

string

lower(string A)

returns the string resulting from converting all characters of B to lower case, for example, lower('fOoBaR') results in 'foobar'

string

lcase(string A)

Same as lower

string

trim(string A)

returns the string resulting from trimming spaces from both ends of A, for example, trim(' foobar ') results in 'foobar'

string

ltrim(string A)

returns the string resulting from trimming spaces from the beginning(left hand side) of A. For example, ltrim(' foobar ') results in 'foobar '

string

rtrim(string A)

returns the string resulting from trimming spaces from the end(right hand side) of A. For example, rtrim(' foobar ') results in ' foobar'

string

regexp_replace(string A, string B, string C)

returns the string resulting from replacing all substrings in B that match the Java regular expression syntax(See Java regular expressions syntax) with C. For example, regexp_replace('foobar', 'oo|ar', ) returns 'fb'

int

size(Map<K.V>)

returns the number of elements in the map type

int

size(Array<T>)

returns the number of elements in the array type

value of <type>

cast(<expr> as <type>)

converts the results of the expression expr to <type>, for example, cast('1' as BIGINT) will convert the string '1' to it integral representation. A null is returned if the conversion does not succeed.

string

from_unixtime(int unixtime)

convert the number of seconds from the UNIX epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00"

string

to_date(string timestamp)

Return the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01"

int

year(string date)

Return the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970

int

month(string date)

Return the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11

int

day(string date)

Return the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1

string

get_json_object(string json_string, string path)

Extract json object from a json string based on json path specified, and return json string of the extracted json object. It will return null if the input json string is invalid.

  • The following built in aggregate functions are supported in Hive:

Return Type

Aggregation Function Name (Signature)

Description

BIGINT

count(*), count(expr), count(DISTINCT expr[, expr_.])

count(*)Returns the total number of retrieved rows, including rows containing NULL values; count(expr)Returns the number of rows for which the supplied expression is non-NULL; count(DISTINCT expr[, expr])Returns the number of rows for which the supplied expression(s) are unique and non-NULL.

DOUBLE

sum(col), sum(DISTINCT col)

returns the sum of the elements in the group or the sum of the distinct values of the column in the group

DOUBLE

avg(col), avg(DISTINCT col)

returns the average of the elements in the group or the average of the distinct values of the column in the group

DOUBLE

min(col)

returns the minimum value of the column in the group

DOUBLE

max(col)

returns the maximum value of the column in the group

Language Capabilities

Hive's SQL provides the basic SQL operations. These operations work on tables or partitions. These operations are:

...