...
Numeric Types
TINYINT
(1-byte signed integer, from-128
to127
)SMALLINT
(2-byte signed integer, from-32,768
to32,767
)INT/INTEGER (4-byte signed integer, from -2,147,483,648 to 2,147,483,647)
BIGINT
(8-byte signed integer, from-9,223,372,036,854,775,808
to9,223,372,036,854,775,807
)FLOAT
(4-byte single precision floating point number)DOUBLE
(8-byte double precision floating point number)DOUBLE PRECISION (alias for DOUBLE, only available starting with Hive 2.2.0)
DECIMAL
NUMERIC
(same asDECIMAL
, starting with Hive 3.0.0)
Date/Time Types
TIMESTAMP
(Note: Only available starting with Hive 0.8.0)DATE
(Note: Only available starting with Hive 0.12.0)INTERVAL
(Note: Only available starting with Hive 1.2.0)
String Types
STRING
VARCHAR
(Note: Only available starting with Hive 0.12.0)CHAR
(Note: Only available starting with Hive 0.13.0)
...
Integral literals are assumed to be INT
by default, unless the number exceeds the range of INT
in which case it is interpreted as a BIGINT, or if one of the following postfixes is present on the number.
Type | Postfix | Example |
---|---|---|
TINYINT | Y | 100Y |
SMALLINT | S | 100S |
BIGINT | L | 100L |
Info | ||
---|---|---|
| ||
|
...
Varchar types are created with a length specifier (between 1 and 6535565535), which defines the maximum number of characters allowed in the character string. If a string value being converted/assigned to a varchar value exceeds the length specifier, the string is silently truncated. Character length is determined by the number of code points contained by the character string.
...
Timestamps are interpreted to be timezoneless and stored as an offset from the UNIX epoch. Convenience UDFs for conversion to and from timezones are provided (to_utc_timestamp
, from_utc_timestamp
).
All existing datetime UDFs (month, day, year, hour, etc.) work with the TIMESTAMP
data type.
Timestamps in text files have to use the format yyyy-mm-dd hh:mm:ss[.f...]
. If they are in another format, declare them as the appropriate type (INT, FLOAT, STRING, etc.) and use a UDF to convert them to timestamps.
On the table level, alternative timestamp formats can be supported by providing the format to the SerDe property "timestamp.formats" (as of release 1.2.0 with HIVE-9298). For example, yyyy-MM-dd'T'HH:mm:ss.SSS,yyyy-MM-dd'T'HH:mm:ss.
Info | ||
---|---|---|
| ||
Timestamps were introduced in Hive 0.8.0 (HIVE-2272). |
Timestamps in Parquet files may be stored as int64 (as opposed to int96) by setting hive.parquet.write.int64.timestamp=true
and hive.parquet.timestamp.time.unit
to a default storage time unit. ("nanos", "micros",
"millis"
; default: "micros"
). Note that because only 64 bits are stored, int64 timestamps stored as "nanos"
will be stored as NULL if outside the range of 1677-09-21T00:12:43.15 and 2262-04-11T23:47:16.8.
On the table level, alternative timestamp formats can be supported by providing the format to the SerDe property "timestamp.formats" (as of release 1.2.0 with HIVE-9298). For example, yyyy-MM-dd'T'HH:mm:ss.SSS,yyyy-MM-dd'T'HH:mm:ss
Dates Anchor
DATE
values describe a particular year/month/day, in the form YYYY-MM-DD
. For example, DATE '2013-01-01'. Date types do not have a time of day component. The range of values supported for the Date type is 0000-01-01 to 9999-12-31, dependent on support by the primitive Java Date type.
Info | ||
---|---|---|
| ||
Dates Timestamps were introduced in Hive 0.128.0 (HIVE-40552272). |
...
Dates
...
Date types can only be converted to/from Date, Timestamp, or String types.
...
Valid casts to/from Date type
...
Result
...
cast(date as date)
...
Same date value
Anchor | ||||
---|---|---|---|---|
|
Anchor | ||||
---|---|---|---|---|
|
DATE
values describe a particular
...
cast(timestamp as date)
...
year/month/day
...
cast(string as date)
...
If the string is in the form 'YYYY-MM-DD', then a date value corresponding to that year/month/day is returned. If the string value does not match this formate, then NULL is returned.
...
cast(date as timestamp)
...
A timestamp value is generated corresponding to midnight of the year/month/day of the date value, based on the local timezone.
...
cast(date as string)
...
The year/month/day represented by the Date is formatted as a string in the form 'YYYY-MM-DD'.
, in the form YYYY-MM-DD
. For example, DATE '2013-01-01'. Date types do not have a time of day component. The range of values supported for the Date type is 0000-01-01 to 9999-12-31, dependent on support by the primitive Java Date type.
Info | ||
---|---|---|
| ||
Dates were introduced in Hive 0.12.0 (HIVE-4055). |
Casting Dates
Date types can only be converted to/from Date, Timestamp, or String types. Casting with user-specified formats is documented here.
Valid casts to/from Date type | Result |
---|---|
cast(date as date) | Same date value |
cast(timestamp as date) | The year/month/day of the timestamp is determined, based on the local timezone, and returned as a date value. |
cast(string as date) | If the string is in the form 'YYYY-MM-DD', then a date value corresponding to that year/month/day is returned. If the string value does not match this formate, then NULL is returned. |
cast(date as timestamp) | A timestamp value is generated corresponding to midnight of the year/month/day of the date value, based on the local timezone. |
cast(date as string) | The year/month/day represented by the Date is formatted as a string in the form 'YYYY-MM-DD'. |
Intervals
Supported Interval Description | Example | Meaning | Since |
---|---|---|---|
Intervals of time units: SECOND / MINUTE / DAY / MONTH / YEAR | INTERVAL '1' DAY | an interval of 1 day(s) |
Intervals
Supported Interval Description
Example
Intervals of time units:
SECOND / MINUTE / DAY / MONTH / YEAR
INTERVAL '1' DAY
Hive 1.2.0 (HIVE-9792).
Year to month intervals, format: SY-M
S: optional sign (+/-)
Y: year count
M: month count
shorthand for:
INTERVAL '1' YEAR +INTERVAL '2' MONTH
Hive 1.2.0 (HIVE-9792). |
Year to |
month intervals, format: |
SY-M S: optional sign (+/-) |
Y: year count |
S: seconds
nnnnnn: optional nanotime
month count | INTERVAL '1 |
-2' YEAR TO MONTH | shorthand for: INTERVAL '1' |
YEAR + |
INTERVAL '3' MINUTE +
INTERVAL '4' SECOND +
INTERVAL '5' NANO
MONTH | Hive 1.2.0 (HIVE-9792). |
aids query readability / portability
Support for intervals with expressions:
this may involve other functions/columns.
The expression must return with a number (which is not floating-point) or with a string.
Optional usage of interval keyword
Note |
---|
the usage of the INTERVAL keyword is mandatory |
1 DAY
'1-2' YEAR TO MONTH
INTERVAL 1 DAY
INTERVAL '1-2' YEARS TO MONTH
Day to second intervals, format: SD H:M:S.nnnnnn S: optional sign (+/-) D: day countH: hoursM: minutes S: seconds nnnnnn: optional nanotime | INTERVAL '1 2:3:4.000005' DAY | shorthand for: INTERVAL '1' DAY+ | Hive 1 |
.2.0 (HIVE- |
Add timeunit aliases to aid portability / readability:
SECONDS / MINUTES / HOURS / DAYS / WEEKS / MONTHS / YEARS
9792). |
Support for intervals with constant numbers | INTERVAL 1 DAY | aids query readability / portability |
Hive 2.2.0 (HIVE-13557). |
...
title | Version |
---|
...
Support for intervals with expressions: | INTERVAL (1+dt) DAY | enables dynamic intervals | Hive 2.2.0 (HIVE- |
...
The DECIMAL
type in Hive is based on Java's BigDecimal which is used for representing immutable arbitrary precision decimal numbers in Java. All regular number operations (e.g. +, -, *, /) and relevant UDFs (e.g. Floor, Ceil, Round, and many more) handle decimal types. You can cast to/from decimal types like you would do with other numeric types. The persistence format of the decimal type supports both scientific and non-scientific notation. Therefore, regardless of whether your dataset contains data like 4.004E+3 (scientific notation) or 4004 (non-scientific notation) or a combination of both, DECIMAL
can be used for it.
- Hive 0.11 and 0.12 have the precision of the
DECIMAL
type fixed and limited to 38 digits. - As of Hive 0.13 users can specify scale and precision when creating tables with the
DECIMAL
datatype using aDECIMAL(precision, scale)
syntax. If scale is not specified, it defaults to 0 (no fractional digits). If no precision is specified, it defaults to 10.
...
CREATE TABLE foo (
a DECIMAL, -- Defaults to decimal(10,0)
b DECIMAL(9, 7)
)
13557). | |||||
Optional usage of interval keyword
| 1 DAY | INTERVAL 1 DAY | Hive 2.2.0 (HIVE-13557). | ||
Add timeunit aliases to aid portability / readability: SECONDS / MINUTES / HOURS / DAYS / WEEKS / MONTHS / YEARS | 2 SECONDS | 2 SECOND | Hive 2.2.0 (HIVE-13557). |
Decimals
Anchor | ||||
---|---|---|---|---|
|
Anchor | ||||
---|---|---|---|---|
|
Info | ||
---|---|---|
| ||
Decimal datatype was introduced in Hive 0.11.0 (HIVE-2693) and revised in Hive 0.13.0 (HIVE-3976).
|
The DECIMAL
type in Hive is based on Java's BigDecimal which is used for representing immutable arbitrary precision decimal numbers in Java. All regular number operations (e.g. +, -, *, /) and relevant UDFs (e.g. Floor, Ceil, Round, and many more) handle decimal types. You can cast to/from decimal types like you would do with other numeric types. The persistence format of the decimal type supports both scientific and non-scientific notation. Therefore, regardless of whether your dataset contains data like 4.004E+3 (scientific notation) or 4004 (non-scientific notation) or a combination of both, DECIMAL
can be used for it.
- Hive 0.11 and 0.12 have the precision of the
DECIMAL
type fixed and limited to 38 digits. - As of Hive 0.13 users can specify scale and precision when creating tables with the
DECIMAL
datatype using aDECIMAL(precision, scale)
syntax. If scale is not specified, it defaults to 0 (no fractional digits). If no precision is specified, it defaults to 10.
Code Block | ||||
---|---|---|---|---|
| ||||
CREATE TABLE foo (
a DECIMAL, -- Defaults to decimal(10,0)
b DECIMAL(9, 7)
)
|
For usage, see 82706456 in the Literals section below.
Decimal Literals
Anchor | ||||
---|---|---|---|---|
|
Integral literals larger than BIGINT must be handled with Decimal(38,0). The Postfix BD is required. Example:
select CAST(18446744073709001000BD AS DECIMAL(38,0)) from my_table limit 1;
Decimal Type Incompatibilities between Hive 0.12.0 and 0.13.0
With the changes in the Decimal data type in Hive 0.13.0, the pre-Hive 0.13.0 columns (of type "decimal") will be treated as being of type decimal(10,0). What this means is that existing data being read from these tables will be treated as 10-digit integer values, and data being written to these tables will be converted to 10-digit integer values before being written. To avoid these issues, Hive users on 0.12 or earlier with tables containing Decimal columns will be required to migrate their tables, after upgrading to Hive 0.13.0 or later.
Upgrading Pre-Hive 0.13.0 Decimal Columns
If the user was on Hive 0.12.0 or earlier and created tables with decimal columns, they should perform the following steps on these tables after upgrading to Hive 0.13.0 or later.
- Determine what precision/scale you would like to set for the decimal column in the table.
For each decimal column in the table, update the column definition to the desired precision/scale
For usage, see Floating Point Types in the Literals section below.
...
Integral literals larger than BIGINT must be handled with Decimal(38,0). The Postfix BD is required. Example:
select CAST(18446744073709001000BD AS DECIMAL(38,0)) from my_table limit 1;
Decimal Type Incompatibilities between Hive 0.12.0 and 0.13.0
With the changes in the Decimal data type in Hive 0.13.0, the pre-Hive 0.13.0 columns (of type "decimal") will be treated as being of type decimal(10,0). What this means is that existing data being read from these tables will be treated as 10-digit integer values, and data being written to these tables will be converted to 10-digit integer values before being written. To avoid these issues, Hive users on 0.12 or earlier with tables containing Decimal columns will be required to migrate their tables, after upgrading to Hive 0.13.0 or later.
Upgrading Pre-Hive 0.13.0 Decimal Columns
If the user was on Hive 0.12.0 or earlier and created tables with decimal columns, they should perform the following steps on these tables after upgrading to Hive 0.13.0 or later.
- Determine what precision/scale you would like to set for the decimal column in the table.
For each decimal column in the table, update the column definition to the desired precision/scale using the ALTER TABLE command:
Code Block sql sql ALTER TABLE foo CHANGE COLUMN dec_column_name dec_column_name DECIMAL(38,18);
If the table is not a partitioned table, then you are done. If the table has partitions, then go on to step 3.
If the table is a partitioned table, then find the list of partitions for the table:
Code Block sql sql SHOW PARTITIONS foo; ds=2008-04-08/hr=11 ds=2008-04-08/hr=12 ...
Each existing partition in the table must also have its DECIMAL column changed to add the desired precision/scale.
This can be done with a single ALTER TABLE CHANGE COLUMN by using dynamic partitioning (available for ALTER TABLE CHANGE COLUMN in Hive 0.14 or later, with HIVE-8411):
Code Block sql sql SET hive.exec.dynamic.partition = true; -- hive.exec.dynamic.partition needs to be set to true to enable dynamic partitioning with ALTER PARTITION -- This will alter all existing partitions of the table - be sure you know what you are doing! ALTER TABLE foo PARTITION (ds, hr) CHANGE COLUMN dec_column_name dec_column_name DECIMAL(38,18);
Alternatively, this can be done one partition at a time using ALTER TABLE CHANGE COLUMN, by specifying one partition per statement (This is available in Hive 0.14 or later, with HIVE-7971.):
Code Block sql sql ALTER TABLE foo PARTITION (ds='2008-04-08', hr=11) CHANGE COLUMN dec_column_name dec_column_name DECIMAL(38,18); ALTER TABLE foo PARTITION (ds='2008-04-08', hr=12) CHANGE COLUMN dec_column_name dec_column_name DECIMAL(38,18); ...
...
Warning | ||
---|---|---|
| ||
The UNIONTYPE datatype was introduced in Hive 0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP BY clauses will fail, and Hive does not define syntax to extract the tag or value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively lookpass-atthrough-only. |
Union types can at any one point hold exactly one of their specified data types. You can create an instance of this type using the create_union
UDF:
...
Info | ||
---|---|---|
| ||
Decimal datatype was introduced in Hive 0.11.0 (HIVE-2693). See Decimal Datatype above.
|
Decimal literals provide precise values and greater range for floating point numbers than the DOUBLE type. Decimal data types store exact representations of numeric values, while DOUBLE data types store very close approximations of numeric values.
...
Basic mathematical operations that can use decimal types include:
- Positive
- Negative
- Addition
- Subtraction
- Multiplication
- Division
- Average (avg)
- Sum
- Count
- Modulus (pmod)
- Addition
- Subtraction
- Multiplication
- Division
- Average (avg)
- Sum
- Count
- Modulus (pmod)
- Sign – Hive 0.13.0 and later
- Exp – Hive 0.13.0 and later
- Ln – Hive 0.13.0 and later
- Log2 – Hive 0.13.0 and later
- Log10 – Hive 0.13.0 and later
- Log(base) – Hive 0.13.0 and later
- Sqrt – Hive 0.13.0 and later
- Sin – Hive 0.13.0 and later
- Asin – Sign – Hive 0.13.0 and and later
- Exp – Cos – Hive 0.13.0 and later
- Ln – Acos – Hive 0.13.0 and later
- Log2 – Tan – Hive 0.13.0 and later
- Log10 – Atan – Hive 0.13.0 and later
- Log(base) – Radians – Hive 0.13.0 and later
- Sqrt – Degrees – Hive 0.13.0 and later
- Sin – Hive 0.13.0 and later
- Asin – Hive 0.13.0 and later
- Cos – Hive 0.13.0 and later
- Acos – Hive 0.13.0 and later
- Tan – Hive 0.13.0 and later
- Atan – Hive 0.13.0 and later
- Radians – Hive 0.13.0 and later
- Degrees – Hive 0.13.0 and later
These rounding functions can also take decimal types:
- Floor
- Ceiling
- Round
Power(decimal, n) only supports positive integer values for the exponent n.
Casting Decimal Values
Casting is supported between decimal values and any other primitive type such as integer, double, boolean, and so on.
Testing Decimal Types
Two new tests have been added as part of the TestCliDriver framework within Hive. They are decimal_1.q and decimal_2.q. Other tests such as udf7.q cover the gamut of UDFs mentioned above.
More tests need to be added that demonstrate failure or when certain types of casts are prevented (for example, casting to date). There is some ambiguity in the round function because the rounding of Decimal does not work exactly as the SQL standard, and therefore it has been omitted in the current work.
For general information about running Hive tests, see How to Contribute to Apache Hive and Hive Developer FAQ.
Handling of NULL Values
Missing values are represented by the special value NULL. To import data with NULL fields, check documentation of the SerDe used by the table. (The default Text Format uses LazySimpleSerDe which interprets the string \N as NULL when importing.)
Allowed Implicit Conversions
- later
These rounding functions can also take decimal types:
- Floor
- Ceiling
- Round
Power(decimal, n) only supports positive integer values for the exponent n.
Casting Decimal Values
Casting is supported between decimal values and any other primitive type such as integer, double, boolean, and so on.
Testing Decimal Types
Two new tests have been added as part of the TestCliDriver framework within Hive. They are decimal_1.q and decimal_2.q. Other tests such as udf7.q cover the gamut of UDFs mentioned above.
More tests need to be added that demonstrate failure or when certain types of casts are prevented (for example, casting to date). There is some ambiguity in the round function because the rounding of Decimal does not work exactly as the SQL standard, and therefore it has been omitted in the current work.
For general information about running Hive tests, see How to Contribute to Apache Hive and Hive Developer FAQ.
Handling of NULL Values
Missing values are represented by the special value NULL. To import data with NULL fields, check documentation of the SerDe used by the table. (The default Text Format uses LazySimpleSerDe which interprets the string \N as NULL when importing.)
Change Types
When hive.metastore.disallow.incompatible.col.type.changes is set to false, the types of columns in Metastore can be changed from any type to any other type. After such a type change, if the data can be shown correctly with the new type, the data will be displayed. Otherwise, the data will be displayed as NULL.
Allowed Implicit Conversions
...
void | boolean | tinyint | smallint | int | bigint | float | double | decimal | string | varchar | timestamp | date | binary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
void to | true | true | true | true | true | true | true | true | true | true | true | true | true | true |
boolean to | false | true | false | false | false | false | false | false | false | false | false | false | false | false |
tinyint to | false | false | true | true | true | true | true | true | true | true | true | false | false | false |
smallint to | false | false | false | true | true | true | true | true | true | true | true | false | false | false |
int to | false | false | false | false | true | true | true | true | true | true | true | false | false | false |
bigint to | false | false | false | false | false | true | true | true | true | true | true | false | false | false |
float to | false | false | false | false | false | false | true | true | true | true | true | false | false | false |
double to | false | false | false | false | false | false | false | true | true | true | true | false | false | false |
decimal to | false | false | false | false | false | false | false | false | true | true | true | false | false | false |
string to | false | false | false | false | false | false | false | true | true | true | true | false | false | false |
varchar to | false | false | false | false | false | false | false | true | true | true | true | false | false | false |
timestamp to | false | false | false | false | false | false | false | false | false | true | true | true | false | false |
date to | false | false | false | false | false | false | false | false | false | true | true | false | true | false |
binary to | false | false | false | false | false | false | false | false | false | false | false | false | false | true |
...