Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated table creation statements to match syntax for TBLPROPERTIES.

...

Avro type

Becomes Hive type

Note

null

void

boolean

boolean

int

int

long

bigint

float

float

double

double

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="4fe36a4646cba818-a2dbbefc-42334745-b4ed9d98-5871eb318709f062bd8f986e"><ac:plain-text-body><![CDATA[

bytes

Array[smallint]

Hive converts these to signed bytes.

]]></ac:plain-text-body></ac:structured-macro>

string

string

record

struct

map

map

list

array

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="170b7caf06cd6af9-92b7cf42-447e48ac-93bb826c-a7552678f1fc85f3b67a2a3e"><ac:plain-text-body><![CDATA[

union

union

Unions of [T, null] transparently convert to nullable T, other types translate directly to Hive's unions of those types. However, unions were introduced in Hive 7 and are not currently able to be used in where/group-by statements. They are essentially look-at-only. Because the AvroSerde transparently converts [T,null], to nullable T, this limitation only applies to unions of multiple types or unions not of a single type and null.

]]></ac:plain-text-body></ac:structured-macro>

enum

string

Hive has no concept of enums

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="ef1e0fe9522df9d1-494f9071-43a6465c-ad1fb38e-71820842c251721ffd834b32"><ac:plain-text-body><![CDATA[

fixed

Array[smallint]

Hive converts the bytes to signed int

]]></ac:plain-text-body></ac:structured-macro>

...

Code Block
CREATE TABLE kst
  PARTITIONED BY (ds string)
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH TBLPROPERTIES (
    'avro.schema.url'='http://schema_provider/kst.avsc')
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';

  TBLPROPERTIES (
    'avro.schema.url'='http://schema_provider/kst.avsc');

In this example we're pulling the source-of-truth reader schema In this example we're pulling the source-of-truth reader schema from a webserver. Other options for providing the schema are described below.
Add the Avro files to the database (or create an external table) using standard Hive operations(http://wiki.apache.org/hadoop/Hive/LanguageManual/DML).
This table might result in a description as below:

...

Code Block
CREATE TABLE as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH TBLPROPERTIES (
    'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
insert overwrite table as_avro select * from TBLPROPERTIES (
    'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc');
insert overwrite table as_avro select * from test_serializer;

The files that are written by the Hive job are valid Avro files, however, MapReduce doesn't add the standard .avro extension. If you copy these files out, you'll likely want to rename them with .avro.

...

Code Block
CREATE TABLE embedded
  COMMENT "just drop the schema right into the HQL"
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.AvroSerDe'
  WITHSTORED TBLPROPERTIESAS (INPUTFORMAT
    'avro.schema.literal'='{'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
    "namespace": "com.howdy",
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.literal'='{
      "namespace": "com.howdy",
      "name": "some_schema",
      "type": "record",
      "fields": [ { "name":"string1","type":"string"}]
    }')
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';

Note that the value is enclosed in single quotes and just pasted into the create statement.

...

Code Block
set hiveconf:schema;
DROP TABLE example;
CREATE TABLE example
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.AvroSerDe'
  WITH TBLPROPERTIES (
    'avro.schema.literal'='${hiveconf:schema}')

  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.literal'='${hiveconf:schema}');

To execute this script file, assuming $SCHEMA has been defined to be the escaped schema value:

...