Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Change references to fixed and bytes to reflect new binary mapping in 0.12 (HIVE-3264)

...

Avro type

Becomes Hive type

Note

null

void

boolean

boolean

int

int

long

bigint

float

float

double

double

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="fdfc707c65f07608-5d59ba3a-4b7d4a81-9bb4a6a1-415c91ddae0b0d52ebcd889e"><ac:plain-text-body><![CDATA[

bytes

binary

Bytes are converted to Array[smallint] Hive converts these to signed bytes. prior to Hive 0.12.0

]]></ac:plain-text-body></ac:structured-macro>

string

string

record

struct

map

map

list

array

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="0d8f80b113c55c1b-39678707-45334f0a-991aae81-28af60b66a475ac6d3c559b8"><ac:plain-text-body><![CDATA[

union

union

Unions of [T, null] transparently convert to nullable T, other types translate directly to Hive's unions of those types. However, unions were introduced in Hive 7 and are not currently able to be used in where/group-by statements. They are essentially look-at-only. Because the AvroSerde transparently converts [T,null], to nullable T, this limitation only applies to unions of multiple types or unions not of a single type and null.

]]></ac:plain-text-body></ac:structured-macro>

enum

string

Hive has no concept of enums

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="1d276bb601fa0991-134dd884-4180488e-b325bfec-2befe79e4463fdaac4dd8476"><ac:plain-text-body><![CDATA[

fixed

binary

Fixeds are converted to Array[smallint] Hive converts the bytes to signed intprior to Hive 0.12.0

]]></ac:plain-text-body></ac:structured-macro>

...

In this example we're pulling the source-of-truth reader schema from a webserver. Other options for providing the schema are described below.
Add the Avro files to the database (or create an external table) using standard Hive operations(http://wiki.apache.org/hadoop/Hive/LanguageManual/DMLImage Removed).
This table might result in a description as below:

Code Block
hive> describe kst;
OK
string1 string  from deserializer
string2 string  from deserializer
int1    int     from deserializer
boolean1        boolean from deserializer
long1   bigint  from deserializer
float1  float   from deserializer
double1 double  from deserializer
inner_record1   struct<int_in_inner_record1:int,string_in_inner_record1:string> from deserializer
enum1   string  from deserializer
array1  array<string>   from                                                                                                                                                                                                                     
OK                                                                                                                                                                                                                                                                                         
string1 string  from deserializer                                                                                                                                                                                                                                                     
string2 string  from deserializer                                                                                                                                                                                                                                   
int1    int     from deserializer                                                                                                                                                                                                                                                           
boolean1        boolean from deserializer                                                                                                                                                                                                                                             
long1   bigint  from deserializer                                                                                                                                                                                                                                                    
float1  float   from deserializer                                                                                                                                                                                                                                                        
double1 double  from deserializer                                                                                                                                                                                                                                                          
inner_record1   struct<int_in_inner_record1:int,string_in_inner_record1:string> from deserializer                                                                                                                                                                                         
enum1   string  from deserializer                                                                                                                                                                                                                                                        
array1  array<string>   from deserializer                                                                                                                                                                                                                                            
map1    map<string,string>      from deserializer                                                                                                                                                                                                                                         
union1  uniontype<float,boolean,string> from deserializer                                                                                                                                                                                                                                
fixed1  array<tinyint>  from deserializer                                                                                                                                                                                                                                                
null1   void    from deserializer                                                                                                                                                                                                                                                                 
unionnullint    int     from deserializer                                                                                                                                                                                                                                               
bytes1  array<tinyint>deserializer
map1    map<string,string>      from deserializer
union1  uniontype<float,boolean,string> from deserializer
fixed1  binary  from deserializer
null1   void    from deserializer
unionnullint    int     from deserializer
bytes1  binary  from deserializer

At this point, the Avro-backed table can be worked with in Hive like any other table.

...

Code Block
CREATE TABLE test_serializer(string1 STRING,
                             int1 INT,
                             tinyint1 TINYINT,
                             smallint1 SMALLINT,
                             bigint1 BIGINT,
                             boolean1 BOOLEAN,
                             float1 FLOAT,
                             double1 DOUBLE,
                             list1 ARRAY<STRING>,
                             map1 MAP<STRING,INT>,
                             struct1 STRUCT<sint:INT,sboolean:BOOLEAN,sstring:STRING>,
                             union1 uniontype<FLOAT, BOOLEAN, STRING>,
                             enum1 STRING,
                             nullableint INT,
                             bytes1 ARRAY<TINYINT>BINARY,
                             fixed1 ARRAY<TINYINT>BINARY)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':' MAP KEYS TERMINATED BY '#' LINES TERMINATED BY '\n'
 STORED AS TEXTFILE;
 

...

why hello there

42

3

100

1412341

true

42.43

85.23423424

alpha:beta:gamma

Earth#42:Control#86:Bob#31

17:true:Abe Linkedin

0:3.141459

BLUE

72

0:1:2:3:4:5

^A^B^C

^A^B^C 50:51:53

another record

98

4

101

9999999

false

99.89

0.00000009

beta

Earth#101

1134:false:wazzup

1:true

RED

NULL

6:7:8:9:10

^D^E^F^G

^D^E^F 54:55:56

third record

45

5

102

999999999

true

89.99

0.00000000000009

alpha:gamma

Earth#237:Bob#723

102:false:BNL

2:Time to go home

GREEN

NULL

11:12:13

^H

^G^H^I 57:58:59

one can write it out to Avro with:

...