...
Avro type | Becomes Hive type | Note | ||
---|---|---|---|---|
null | void | |||
boolean | boolean | |||
int | int | |||
long | bigint | |||
float | float | |||
double | double | |||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="75a2dea4ed73f03e-9a8d55f7-4ff742f4-b9dcb44d-08b4a4b5f13f7efb4deb268d"><ac:plain-text-body><![CDATA[ | bytes | Array[smallint] | Hive converts these to signed bytes. | ]]></ac:plain-text-body></ac:structured-macro> |
string | string | |||
record | struct | |||
map | map | |||
list | array | |||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="a76ea929e9de1f46-67dbc4d4-4ddc4580-ba5192bc-5cf946e8bf6f5bf6eb50fe21"><ac:plain-text-body><![CDATA[ | union | union | Unions of [T, null] transparently convert to nullable T, other types translate directly to Hive's unions of those types. However, unions were introduced in Hive 7 and are not currently able to be used in where/group-by statements. They are essentially look-at-only. Because the AvroSerde transparently converts [T,null], to nullable T, this limitation only applies to unions of multiple types or unions not of a single type and null. | ]]></ac:plain-text-body></ac:structured-macro> |
enum | string | Hive has no concept of enums | ||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="f5874082dc7058e6-fd6ee4a1-46e94596-bad3afbf-cf8ef9c760c0149ea9a329ca"><ac:plain-text-body><![CDATA[ | fixed | Array[smallint] | Hive converts the bytes to signed int | ]]></ac:plain-text-body></ac:structured-macro> |
...
Code Block |
---|
CREATE TABLE as_avro
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH TBLPROPERTIES (
'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc')
STORED as INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
insert overwrite table as_avro select * from test_serializer;
|
...
There are three ways to provide the reader schema for an Avro table, all of which involve parameters to the serde. As the schema involves, one can update these values by updating the parameters in the table.
Use avro.schema.url
Specifies a url to access the schema from. For http schemas, this works for testing and small-scale clusters, but as the schema will be accessed at least once from each task in the job, this can quickly turn the job into a DDOS attack against the URL provider (a web server, for instance). Use caution when using this parameter for anything other than testing.
...
Note that $SCHEMA is interpolated into the quotes to correctly handle spaces within the schema.
Use none to ignore either avro.schema.literal or avro.schema.url
Hive does not provide an easy way to unset or remove a property. If you wish to switch from using url or schema to the other, set the to-be-ignored value to none and the AvroSerde will treat it as if it were not set.
...
- Why do I get error-error-error-error-error-error-error and a message to check avro.schema.literal and avro.schema.url when describing a table or running a query against a table?
The AvroSerde returns this message when it has trouble finding or parsing the schema provided by either the avro.schema.literal or avro.avro.schema.url value. It is unable to be more specific because Hive expects all calls to the serde config methods to be successful, meaning we are unable to return an actual exception. By signaling an error via this message, the table is left in a good state and the incorrect value can be corrected with a call to alter table T set TBLPROPERTIES.