Hive Data Types
Overview
This lists all supported data types in Hive.
Numeric Types
Date/Time Types
TIMESTAMP
(Note: Only available starting with Hive 0.8.0)DATE
(Note: Only available starting with Hive 0.12.0)
String Types
Misc Types
BOOLEAN
BINARY
(Note: Only available starting with Hive 0.8.0)
Complex Types
- arrays:
ARRAY<data_type>
- maps:
MAP<primitive_type, data_type>
structs:
STRUCT<col_name : data_type [COMMENT col_comment], ...>
- union:
UNIONTYPE<data_type, data_type, ...>
Column Types
Integral Types (TINYINT
, SMALLINT
, INT
, BIGINT
)
Integral literals are assumed to be INT
by default, unless the number exceeds the range of INT
in which case it is interpreted as a BIGINT, or if one of the following postfixes is present on the number.
Type |
Postfix |
Example |
---|---|---|
TINYINT |
Y |
100Y |
SMALLINT |
S |
100S |
BIGINT |
L |
100L |
Strings
String literals can be expressed with either single quotes (') or double quotes ("). Hive uses C-style escaping within the strings.
Varchar
Varchar types are created with a length specifier (between 1 and 65355), which defines the maximum number of characters allowed in the character string. If a string value being converted/assigned to a varchar value exceeds the length specifier, the string is silently truncated. Character length is determined by the number of code points contained by the character string.
Like string, trailing whitespace is significant in varchar and will affect comparison results.
Limitations
Non-generic UDFs cannot directly use varchar type as input arguments or return values. String UDFs can be created instead, and the varchar values will be converted to strings and passed to the UDF. To use varchar arguments directly or to return varchar values, create a GenericUDF.
There may be other contexts which do not support varchar, if they rely on reflection-based methods for retrieving type information. This includes some SerDe implementations.
Version
Varchar datatype was introduced in Hive 0.12.0 (HIVE-4844).
Timestamps
Supports traditional UNIX timestamp with optional nanosecond precision.
Supported conversions:
- Integer numeric types: Interpreted as UNIX timestamp in seconds
- Floating point numeric types: Interpreted as UNIX timestamp in seconds with decimal precision
- Strings: JDBC compliant java.sql.Timestamp format "
YYYY-MM-DD HH:MM:SS.fffffffff
" (9 decimal place precision)
Timestamps are interpreted to be timezoneless and stored as an offset from the UNIX epoch. Convenience UDFs for conversion to and from timezones are provided (to_utc_timestamp
, from_utc_timestamp
).
All existing datetime UDFs (month, day, year, hour, etc.) work with the TIMESTAMP
data type.
Version
Timestamps were introduced in Hive 0.8.0 (HIVE-2272).
Dates
DATE
values describe a particular year/month/day, in the form {{YYYY-MM-DD}}. For
example, DATE '2013-01-01'. Date types do not have a time of day component.
The range of values supported for the Date type is be 0000-01-01 to 9999-12-31, dependent on
support by the primitive Java Date type.
Version
Dates were introduced in Hive 0.12.0 (HIVE-4055).
Casting Dates
Date types can only be converted to/from Date, Timestamp, or String types.
Valid casts to/from Date type |
Result |
---|---|
cast(date as date) |
Same date value |
cast(timestamp as date) |
The year/month/day of the timestamp is determined, based on the local timezone, and returned as a date value. |
cast(string as date) |
If the string is in the form 'YYYY-MM-DD', then a date value corresponding to that year/month/day is returned. If the string value does not match this formate, then NULL is returned. |
cast(date as timestamp) |
A timestamp value is generated corresponding to midnight of the year/month/day of the date value, based on the local timezone. |
cast(date as string) |
The year/month/day represented by the Date is formatted as a string in the form 'YYYY-MM-DD'. |
Decimals
The DECIMAL
type in Hive is based on Java's BigDecimal which is used for representing immutable arbitrary precision decimal numbers in Java. All regular number operations (e.g. +, -, *, /) and relevant UDFs (e.g. Floor, Ceil, Round, and many more) handle decimal types. You can cast to/from decimal types like you would do with other numeric types. The persistence format of the decimal type supports both scientific and non-scientific notation. Therefore, regardless of whether your dataset contains data like 1E+44 (scientific notation) or 4004 (non-scientific notation) or a combination of both, DECIMAL
can be used for it.
With the present Hive implementation, the Decimal type has precision limited to 38 digits. While this has minimal impact on usability, it's worth noting for performance reasons. HIVE-3796 is tracking the progress of allowing users to specify scale and precision when creating tables with decimal datatype.
The Decimal datatype is discussed further in Floating Point Types below.
Version
Decimal datatype was introduced in Hive 0.11.0 (HIVE-2693).
Union Types
Union types can at any one point hold exactly one of their specified data types. You can create an instance of this type using the create_union
UDF:
CREATE TABLE union_test(foo UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>); SELECT foo FROM union_test; {0:1} {1:2.0} {2:["three","four"]} {3:{"a":5,"b":"five"}} {2:["six","seven"]} {3:{"a":8,"b":"eight"}} {0:9} {1:10.0}
The first part in the deserialized union is the tag which lets us know which part of the union is being used. In this example 0
means the first data_type from the definition which is an int
and so on.
To create a union you have to provide this tag to the create_union
UDF:
SELECT create_union(0, key), create_union(if(key<100, 0, 1), 2.0, value), create_union(1, "a", struct(2, "b")) FROM src LIMIT 2; {0:"238"} {1:"val_238"} {1:{"col1":2,"col2":"b"}} {0:"86"} {0:2.0} {1:{"col1":2,"col2":"b"}}
Literals
Floating Point Types
Floating point literals are assumed to be DOUBLE. Scientific notation is not yet supported.
Decimal Types
Version
Decimal datatype was introduced in Hive 0.11.0 (HIVE-2693). See Decimal Datatype above.
Decimal literals provide precise values and greater range for floating point numbers than the DOUBLE type. Decimal data types store exact representations of numeric values, while DOUBLE data types store very close approximations of numeric values.
Decimal types are needed for use cases in which the (very close) approximation of a DOUBLE is insufficient, such as financial applications, equality and inequality checks, and rounding operations. They are also needed for use cases that deal with numbers outside the DOUBLE range (approximately -10308 to 10308) or very close to zero (-10-308 to 10-308). For a general discussion of the limits of the DOUBLE type, see the Wikipedia article Double-precision floating-point format.
The precision of a Decimal type is limited to 38 digits in Hive. See HIVE-4271 and HIVE-4320 for comments about the reasons for choosing this limit.
Using Decimal Types
You can create a table in Hive that uses the Decimal type with the following syntax:
create table decimal_1 (t decimal);
The table decimal_1
is a table having one field of type decimal which is basically a Decimal value.
You can read and write values in such a table using either the LazySimpleSerDe or the LazyBinarySerDe. For example:
alter table decimal_1 set serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe';
or:
alter table decimal_1 set serde 'org.apache.hadoop.hive.serde2.lazy.LazyBinarySerDe';
You can use a cast to convert a Decimal value to any other primitive type such as a BOOLEAN. For example:
select cast(t as boolean) from decimal_2;
Mathematical UDFs
Decimal also supports many mathematical UDFs with the same syntax as used in the case of DOUBLE.
Basic mathematical operations that can use decimal types include:
- Positive
- Negative
- Addition
- Subtraction
- Multiplication
- Division
- Average (avg)
- Sum
- Count
- Modulus (pmod)
These rounding functions can also take decimal types:
- Floor
- Ceiling
- Round
But these mathematical UDFs are not currently supported:
- Exp
- Log (ln)
- Log2
- Log10
- Sqrt
Power(decimal, n) only supports positive integer values for the exponent n.
Casting Decimal Values
Casting is supported between decimal values and any other primitive type such as integer, double, boolean, and so on.
Testing Decimal Types
Two new tests have been added as part of the TestCliDriver framework within Hive. They are decimal_1.q and decimal_2.q. Other tests such as udf7.q cover the gamut of UDFs mentioned above.
More tests need to be added that demonstrate failure or when certain types of casts are prevented (for example, casting to date). There is some ambiguity in the round function because the rounding of Decimal does not work exactly as the SQL standard, and therefore it has been omitted in the current work.
For general information about running Hive tests, see How to Contribute to Apache Hive and Hive Developer FAQ.
Handling of NULL Values
Missing values are represented by the special value NULL. To import data with NULL fields, check documentation of the SerDe used by the table. (The default Text Format uses LazySimpleSerDe which interprets the string \N as NULL when importing.)
Allowed Implicit Conversions
|
void |
boolean |
tinyint |
smallint |
int |
bigint |
float |
double |
decimal |
string |
varchar |
timestamp |
date |
binary |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
void to |
true |
true |
true |
true |
true |
true |
true |
true |
true |
true |
true |
true |
true |
true |
boolean to |
false |
true |
false |
false |
false |
false |
false |
false |
false |
false |
false |
false |
false |
false |
tinyint to |
false |
false |
true |
true |
true |
true |
true |
true |
true |
true |
true |
false |
false |
false |
smallint to |
false |
false |
false |
true |
true |
true |
true |
true |
true |
true |
true |
false |
false |
false |
int to |
false |
false |
false |
false |
true |
true |
true |
true |
true |
true |
true |
false |
false |
false |
bigint to |
false |
false |
false |
false |
false |
true |
true |
true |
true |
true |
true |
false |
false |
false |
float to |
false |
false |
false |
false |
false |
false |
true |
true |
true |
true |
true |
false |
false |
false |
double to |
false |
false |
false |
false |
false |
false |
false |
true |
true |
true |
true |
false |
false |
false |
decimal to |
false |
false |
false |
false |
false |
false |
false |
false |
true |
true |
true |
false |
false |
false |
string to |
false |
false |
false |
false |
false |
false |
false |
true |
true |
true |
true |
false |
false |
false |
varchar to |
false |
false |
false |
false |
false |
false |
false |
true |
true |
true |
true |
false |
false |
false |
timestamp to |
false |
false |
false |
false |
false |
false |
false |
false |
false |
true |
true |
true |
false |
false |
date to |
false |
false |
false |
false |
false |
false |
false |
false |
false |
true |
true |
false |
true |
false |
binary to |
false |
false |
false |
false |
false |
false |
false |
false |
false |
false |
false |
false |
false |
true |