...
Avro type | Becomes Hive type | Note | ||
---|---|---|---|---|
null | void | |||
boolean | boolean | |||
int | int | |||
long | bigint | |||
float | float | |||
double | double | |||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="fdfc707c65f07608-5d59ba3a-4b7d4a81-9bb4a6a1-415c91ddae0b0d52ebcd889e"><ac:plain-text-body><![CDATA[ | bytes | binary | Bytes are converted to Array[smallint] Hive converts these to signed bytes. prior to Hive 0.12.0 | ]]></ac:plain-text-body></ac:structured-macro> |
string | string | |||
record | struct | |||
map | map | |||
list | array | |||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="0d8f80b113c55c1b-39678707-45334f0a-991aae81-28af60b66a475ac6d3c559b8"><ac:plain-text-body><![CDATA[ | union | union | Unions of [T, null] transparently convert to nullable T, other types translate directly to Hive's unions of those types. However, unions were introduced in Hive 7 and are not currently able to be used in where/group-by statements. They are essentially look-at-only. Because the AvroSerde transparently converts [T,null], to nullable T, this limitation only applies to unions of multiple types or unions not of a single type and null. | ]]></ac:plain-text-body></ac:structured-macro> |
enum | string | Hive has no concept of enums | ||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="1d276bb601fa0991-134dd884-4180488e-b325bfec-2befe79e4463fdaac4dd8476"><ac:plain-text-body><![CDATA[ | fixed | binary | Fixeds are converted to Array[smallint] Hive converts the bytes to signed intprior to Hive 0.12.0 | ]]></ac:plain-text-body></ac:structured-macro> |
...
In this example we're pulling the source-of-truth reader schema from a webserver. Other options for providing the schema are described below.
Add the Avro files to the database (or create an external table) using standard Hive operations(http://wiki.apache.org/hadoop/Hive/LanguageManual/DML).
This table might result in a description as below:
Code Block |
---|
hive> describe kst; OK string1 string from deserializer string2 string from deserializer int1 int from deserializer boolean1 boolean from deserializer long1 bigint from deserializer float1 float from deserializer double1 double from deserializer inner_record1 struct<int_in_inner_record1:int,string_in_inner_record1:string> from deserializer enum1 string from deserializer array1 array<string> from OK string1 string from deserializer string2 string from deserializer int1 int from deserializer boolean1 boolean from deserializer long1 bigint from deserializer float1 float from deserializer double1 double from deserializer inner_record1 struct<int_in_inner_record1:int,string_in_inner_record1:string> from deserializer enum1 string from deserializer array1 array<string> from deserializer map1 map<string,string> from deserializer union1 uniontype<float,boolean,string> from deserializer fixed1 array<tinyint> from deserializer null1 void from deserializer unionnullint int from deserializer bytes1 array<tinyint>deserializer map1 map<string,string> from deserializer union1 uniontype<float,boolean,string> from deserializer fixed1 binary from deserializer null1 void from deserializer unionnullint int from deserializer bytes1 binary from deserializer |
At this point, the Avro-backed table can be worked with in Hive like any other table.
...
Code Block |
---|
CREATE TABLE test_serializer(string1 STRING, int1 INT, tinyint1 TINYINT, smallint1 SMALLINT, bigint1 BIGINT, boolean1 BOOLEAN, float1 FLOAT, double1 DOUBLE, list1 ARRAY<STRING>, map1 MAP<STRING,INT>, struct1 STRUCT<sint:INT,sboolean:BOOLEAN,sstring:STRING>, union1 uniontype<FLOAT, BOOLEAN, STRING>, enum1 STRING, nullableint INT, bytes1 ARRAY<TINYINT>BINARY, fixed1 ARRAY<TINYINT>BINARY) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':' MAP KEYS TERMINATED BY '#' LINES TERMINATED BY '\n' STORED AS TEXTFILE; |
...
why hello there | 42 | 3 | 100 | 1412341 | true | 42.43 | 85.23423424 | alpha:beta:gamma | Earth#42:Control#86:Bob#31 | 17:true:Abe Linkedin | 0:3.141459 | BLUE | 72 | 0:1:2:3:4:5 | ^A^B^C | ^A^B^C 50:51:53 |
another record | 98 | 4 | 101 | 9999999 | false | 99.89 | 0.00000009 | beta | Earth#101 | 1134:false:wazzup | 1:true | RED | NULL | 6:7:8:9:10 | ^D^E^F^G | ^D^E^F 54:55:56 |
third record | 45 | 5 | 102 | 999999999 | true | 89.99 | 0.00000000000009 | alpha:gamma | Earth#237:Bob#723 | 102:false:BNL | 2:Time to go home | GREEN | NULL | 11:12:13 | ^H | ^G^H^I 57:58:59 |
one can write it out to Avro with:
...