...
- Infers the schema of the Hive table from the Avro schema.
- Reads all Avro files within a table against a specified schema, taking advantage of Avro's backwards compatibility abilities
- Supports arbitrarily nested schemas.
- Translates all Avro data types into equivalent Hive types. Most types map exactly, but some Avro types don't exist in Hive and are automatically converted by the AvroSerde.
- Understands compressed Avro files.
Transparently converts the Avro idiom of handling nullable types as Union\[T, null\] into just T and returns null when appropriate.Wiki Markup - Writes any Hive table to Avro files.
- Has worked reliably against our most convoluted Avro schemas in our ETL process.
...
Avro type | Becomes Hive type | Note | ||
---|---|---|---|---|
null | void | |||
boolean | boolean | |||
int | int | |||
long | bigint | |||
float | float | |||
double | double | |||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="7f774159-6a27-489d-b5c5-7dfe8bdb1418"><ac:plain-text-body><![CDATA[ | bytes | binary | Bytes are converted to Array[smallint] prior to Hive 0.12.0 | ]]></ac:plain-text-body></ac:structured-macro> |
string | string | |||
record | struct | |||
map | map | |||
list | array | |||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="81f7e3de-7628-4e21-a828-72e8fdaa9a3c"><ac:plain-text-body><![CDATA[ | union | union | Unions of [T, null] transparently convert to nullable T, other types translate directly to Hive's unions of those types. However, unions were introduced in Hive 7 and are not currently able to be used in where/group-by statements. They are essentially look-at-only. Because the AvroSerde transparently converts [T,null], to nullable T, this limitation only applies to unions of multiple types or unions not of a single type and null. | ]]></ac:plain-text-body></ac:structured-macro> |
enum | string | Hive has no concept of enums | ||
bytes | binary | Bytes are converted to Array[smallint] prior to Hive 0.12.0 | ||
string | string | |||
record | struct | |||
map | map | |||
list | array | |||
union | union | Unions of [T, null] transparently convert to nullable T, other types translate directly to Hive's unions of those types. However, unions were introduced in Hive 7 and are not currently able to be used in where/group-by statements. They are essentially look-at-only. Because the AvroSerde transparently converts [T,null], to nullable T, this limitation only applies to unions of multiple types or unions not of a single type and null. | ||
enum | string | Hive has no concept of enums <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="08c26006-a403-4f34-a186-1aa625060669"><ac:plain-text-body><![CDATA[ | ||
fixed | binary | Fixeds are converted to Array[smallint] prior to Hive 0.12.0 ]]></ac:plain-text-body></ac:structured-macro> |
Creating Avro-backed Hive tables
...