Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

  • Infers the schema of the Hive table from the Avro schema.
  • Reads all Avro files within a table against a specified schema, taking advantage of Avro's backwards compatibility abilities
  • Supports arbitrarily nested schemas.
  • Translates all Avro data types into equivalent Hive types. Most types map exactly, but some Avro types don't exist in Hive and are automatically converted by the AvroSerde.
  • Understands compressed Avro files.
  • Wiki MarkupTransparently converts the Avro idiom of handling nullable types as Union\[T, null\] into just T and returns null when appropriate.
  • Writes any Hive table to Avro files.
  • Has worked reliably against our most convoluted Avro schemas in our ETL process.

...

Avro type

Becomes Hive type

Note

null

void

boolean

boolean

int

int

long

bigint

float

float

double

double

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="05bfc783-dbdb-4db2-8bbb-ca4f527d63ee"><ac:plain-text-body><![CDATA[

bytes

binary

Bytes are converted to Array[smallint] prior to Hive 0.12.0

]]></ac:plain-text-body></ac:structured-macro>

string

string

record

struct

map

map

list

array

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="bc22734d-99a9-4f74-b532-ca5aabbcee58"><ac:plain-text-body><![CDATA[

union

union

Unions of [T, null] transparently convert to nullable T, other types translate directly to Hive's unions of those types. However, unions were introduced in Hive 7 and are not currently able to be used in where/group-by statements. They are essentially look-at-only. Because the AvroSerde transparently converts [T,null], to nullable T, this limitation only applies to unions of multiple types or unions not of a single type and null.

]]></ac:plain-text-body></ac:structured-macro>

enum

string

Hive has no concept of enums

bytes

binary

Bytes are converted to Array[smallint] prior to Hive 0.12.0

string

string

record

struct

map

map

list

array

union

union

Unions of [T, null] transparently convert to nullable T, other types translate directly to Hive's unions of those types. However, unions were introduced in Hive 7 and are not currently able to be used in where/group-by statements. They are essentially look-at-only. Because the AvroSerde transparently converts [T,null], to nullable T, this limitation only applies to unions of multiple types or unions not of a single type and null.

enum

string

Hive has no concept of enums <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="5691b988-6b34-483a-9cc1-3ff14d1ddd4b"><ac:plain-text-body><![CDATA[

fixed

binary

Fixeds are converted to Array[smallint] prior to Hive 0.12.0 ]]></ac:plain-text-body></ac:structured-macro>

Creating Avro-backed Hive tables

...