...
RHive Operators and User-Defined Functions (UDFs)
Table of Contents |
---|
Tip | ||
---|---|---|
| ||
All Hive keywords are case-insensitive, including the names of Hive operators and functions. |
...
Operator | Operand types | Description |
---|---|---|
A AND B | boolean | TRUE if both A and B are TRUE, otherwise FALSE. NULL if A or B is NULL. |
A OR B | boolean | TRUE if either A or B or both are TRUE, FALSE OR NULL is NULL, otherwise FALSE. |
NOT A | boolean | TRUE if A is FALSE or NULL if A is NULL. Otherwise FALSE. |
! A | boolean | Same as NOT A. |
A IN (val1, val2, ...) | boolean | TRUE if A is equal to any of the values. As of Hive 0.13 subqueries are supported in IN statements. |
A NOT IN (val1, val2, ...) | boolean | TRUE if A is not equal to any of the values. As of Hive 0.13 subqueries are supported in NOT IN statements. |
[NOT] EXISTS (subquery) | TRUE if the the subquery returns at least one row. Supported as of Hive 0.13. |
String Operators
Operator | Operand types | Description |
---|---|---|
A || B | strings | Concatenates the operands - shorthand for |
...
Return Type | Name(Signature) | Description |
---|---|---|
varies | java_method(class, method[, arg1[, arg2..]]) | Synonym for |
varies | reflect(class, method[, arg1[, arg2..]]) | Calls a Java method by matching the argument signature, using reflection. (As of Hive 0.7.0.) See Reflect (Generic) UDF for examples. |
int | hash(a1[, a2...]) | Returns a hash value of the arguments. (As of Hive 0.4.) |
string | current_user() | Returns current user name from the configured authenticator manager (as of Hive 1.2.0). Could be the same as the user provided when connecting, but with some authentication managers (for example HadoopDefaultAuthenticator) it could be different. |
string | logged_in_user() | Returns current user name from the session state (as of Hive 2.2.0). This is the username provided when connecting to Hive. |
string | current_database() | Returns current database name (as of Hive 0.13.0). |
string | md5(string/binary) | Calculates an MD5 128-bit checksum for the string or binary (as of Hive 1.3.0). The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. Example: md5('ABC') = '902fbdd2b1df0c4f70b4a5d23525e932'. |
string | sha1(string/binary) sha(string/binary) | Calculates the SHA-1 digest for string or binary and returns the value as a hex string (as of Hive 1.3.0). Example: sha1('ABC') = '3c01bdbb26f358bab27f267924aa2c9a03fcfdb8'. |
bigint | crc32(string/binary) | Computes a cyclic redundancy check value for string or binary argument and returns bigint value (as of Hive 1.3.0). Example: crc32('ABC') = 2743272264. |
string | sha2(string/binary, int) | Calculates the SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512) (as of Hive 1.3.0). The first argument is the string or binary to be hashed. The second argument indicates the desired bit length of the result, which must have a value of 224, 256, 384, 512, or 0 (which is equivalent to 256). SHA-224 is supported starting from Java 8. If either argument is NULL or the hash length is not one of the permitted values, the return value is NULL. Example: sha2('ABC', 256) = 'b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78'. |
binary | aes_encrypt(input string/binary, key string/binary) | Encrypt input using AES (as of Hive 1.3.0). Key lengths of 128, 192 or 256 bits can be used. 192 and 256 bits keys can be used if Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files are installed. If either argument is NULL or the key length is not one of the permitted values, the return value is NULL. Example: base64(aes_encrypt('ABC', '1234567890123456')) = 'y6Ss+zCYObpCbgfWfyNWTw=='. |
binary | aes_decrypt(input binary, key string/binary) | Decrypt input using AES (as of Hive 1.3.0). Key lengths of 128, 192 or 256 bits can be used. 192 and 256 bits keys can be used if Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files are installed. If either argument is NULL or the key length is not one of the permitted values, the return value is NULL. Example: aes_decrypt(unbase64('y6Ss+zCYObpCbgfWfyNWTw=='), '1234567890123456') = 'ABC'. |
string | version() | Returns the Hive version (as of Hive 2.1.0). The string contains 2 fields, the first being a build number and the second being a build hash. Example: "select version();" might return "2.1.0.2.5.0.0-1245 r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on your build. |
bigint | surrogate_key(write_id_bits, task_id_bits]) | Automatically generate numerical Ids for rows as you enter data into a table. Can only be used as default value for acid or insert-only tables. |
xpath
The following functions are described in LanguageManual XPathUDF:
...
Row-set columns types | Name(Signature) | Description | ||
---|---|---|---|---|
T | explode(ARRAY<T> a) | Explodes an array to multiple rows. Returns a row-set with a single column (col), one row for each element from the array. | ||
Tkey,Tvalue | explode(MAP<Tkey,Tvalue> m) | Explodes a map to multiple rows. Returns a row-set with a two columns (key,value) , one row for each key-value pair from the input map. (As of Hive 0.8.0.). | ||
int,T | posexplode(ARRAY<T> a) | Explodes an array to multiple rows with additional positional column of int type (position of items in the original array, starting with 0). Returns a row-set with two columns (pos,val), one row for each element from the array. | ||
T1,...,Tn | inline(ARRAY<STRUCT<f1:T1,...,fn:Tn>> a) | Explodes an array of structs to multiple rows. Returns a row-set with N columns (N = number of top level elements in the struct), one row per struct from the array. (As of Hive 0.10.) | ||
T1,...,Tn/r | stack(int r,T1 V1,...,Tn/r Vn) | Breaks up n values V1,...,Vn into r rows. Each row will have n/r columns. r must be constant. | ||
string1,...,stringn | json_tuple(string jsonStr,string k1,...,string kn) | Takes JSON string and a set of n keys, and returns a tuple of n values. This is a more efficient version of the | ||
string 1,...,stringn | parse_url_tuple(string urlStr,string p1,...,string pn) | Takes URL string and a set of n URL parts, and returns a tuple of n values. This is similar to the |
Usage Examples
explode (array)
...
HTML | ||
---|---|---|
| ||
<table><tbody><tr><th>col</th></tr><tr><td>A</td></tr><tr><td>B</td></tr><tr><td>C</td></tr></tbody></table> |
explode (map)
Code Block | ||
---|---|---|
| ||
select explode(map('A',10,'B',20,'C',30)); select explode(map('A',10,'B',20,'C',30)) as (key,value); select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf; select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf as key,value; |
...
HTML | ||
---|---|---|
| ||
<table><tbody><tr><th>key</th><th>value</th></tr><tr><td>A</td><td>10</td></tr><tr><td>B</td><td>20</td></tr><tr><td>C</td><td>30</td></tr></tbody></table> |
...
posexplode (array)
...
Code Block | ||
---|---|---|
| ||
select posexplode(array('A','B','C')); select posexplode(array('A','B','C')) as (pos,val); select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf; select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf as pos,val; |
HTML | ||
---|---|---|
| ||
<table><tbody><tr><th>pos</th><th>val</th></tr><tr><td>0</td><td>A</td></tr><tr><td>1</td><td>B</td></tr><tr><td>2</td><td>C</td></tr></tbody></table> |
inline (array of structs)
...
HTML | ||
---|---|---|
| ||
<table><tbody><tr><th>col1</th><th>col2</th><th>col3</th></tr><tr><td>A</td><td>10</td><td>2015-01-01</td></tr><tr><td>B</td><td>20</td><td>2016-02-02</td></tr></tbody></table> |
stack (values)
Code Block | ||
---|---|---|
| ||
select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01'); select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2); select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf; select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf as col0,col1,col2; |
HTML | ||
---|---|---|
| ||
<table><tbody><tr><th>col0</th><th>col1</th><th>col2</th></tr><tr><td>A</td><td>10</td><td>2015-01-01</td></tr><tr><td>B</td><td>20</td><td>2016-01-01</td></tr></tbody></table> |
Using the syntax "SELECT udtf(col) AS colAlias..." has a few limitations:
...
For information about how to create a custom UDF, see Hive Plugins and Create Function.
select explode(array('A','B','C'));select explode(array('A','B','C')) as col;select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;