Page History

...

Avro type	Becomes Hive type	Note
null	void
boolean	boolean
int	int
long	bigint
float	float
double	double
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="1e362db468a4c9d5-c405e3af-42794ed8-838bb27c-2daa054bf84ad50a54bec68e"><ac:plain-text-body><![CDATA[	bytes	Array[smallint]	Hive converts these to signed bytes.	]]></ac:plain-text-body></ac:structured-macro>
string	string
record	struct
map	map
list	array
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="4e02d8b26bcf61dd-b070d039-49c04a01-a6ada73d-cdfd6f7197dee164c2fdae58"><ac:plain-text-body><![CDATA[	union	union	Unions of [T, null] transparently convert to nullable T, other types translate directly to Hive's unions of those types. However, unions were introduced in Hive 7 and are not currently able to be used in where/group-by statements. They are essentially look-at-only. Because the AvroSerde transparently converts [T,null], to nullable T, this limitation only applies to unions of multiple types or unions not of a single type and null.	]]></ac:plain-text-body></ac:structured-macro>
enum	string	Hive has no concept of enums
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="d62ab898b9b75434-7829fb7c-4dc7403a-879faa03-6f874c709182d32224fa95bb"><ac:plain-text-body><![CDATA[	fixed	Array[smallint]	Hive converts the bytes to signed int	]]></ac:plain-text-body></ac:structured-macro>

...

Code Block

CREATE TABLE embedded
  COMMENT "just drop the schema right into the HQL"
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.AvroSerDe'
  WITH TBLPROPERTIES (
    'avro.schema.literal'='{
      "namespace": "com.howdy",
      "name": "some_schema",
      "type": "record",
      "fields": [ { "name":"string1","type":"string"}]
    }')
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';

Note that the value is enclosed in single quotes and just pasted into the create statement.

Use avro.schema.literal and pass the schema into the script

Hive can do simple variable substitution and one can pass the schema embedded in a variable to the script. Note that to do this, the schema must be completely escaped (carriage returns converted to \n, tabs to \t, quotes escaped, etc). An example:

Code Block

set hiveconf:schema;
DROP TABLE example;
CREATE TABLE example
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.AvroSerDe'
  WITH TBLPROPERTIES (
    'avro.schema.literal'='${hiveconf:schema}')

  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';

...

Note that $SCHEMA is interpolated into the quotes to correctly handle spaces within the schema.

Use none to ignore either avro.schema.literal or schema.url

Hive does not provide an easy way to unset or remove a property. If you wish to switch from using url or schema to the other, set the to-be-ignored value to none and the AvroSerde will treat it as if it were not set.

...

Why do I get error-error-error-error-error-error-error and a message to check avro.schema.literal and schema.url when describing a table or running a query against a table?

The AvroSerde returns this message when it has trouble finding or parsing the schema provided by either the avro.schema.literal or avro.schema.url value. It is unable to be more specific because Hive expects all calls to the serde config methods to be successful, meaning we are unable to return an actual exception. By signaling an error via this message, the table is left in a good state and the incorrect value can be corrected with a call to alter table T set TBLPROPERTIES.

Space shortcuts

Child pages

Versions Compared

Old Version 1

New Version 2

Key

Use avro.schema.literal and pass the schema into the script

Use none to ignore either avro.schema.literal or schema.url