Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

NOTE: Apache Hive recommends that custom ObjectInspectors created for use with custom SerDes have a no-argument constructor in addition to their normal constructors for serialization purposes. See HIVE-5380 for more details.

Registration of Native SerDe's

As of Hive 0.14 a registration mechanism has been introduced for native Hive SerDe's to bind 'STORED AS' syntax naturally as a shorthand for a triplet of SerDe, InputFormat, and OutputFormat specification in CreateTable statement.

The following mappings have been added:

SyntaxEquivalent
STORED AS AVRO
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'

STORED AS ORC/

STORED AS ORCFILE

ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

STORED AS PARQUET/

STORED AS PARQUETFILE

ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
STORED AS RCFILE
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
STORED AS TEXTFILE
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'

To add a new native SerDe with STORED AS syntax, follow these steps:

  1. Create a class extending from AbstractStorageFormatDescriptor.java that returns a stored as keyword to {InputFormat, OutputFormat, SerDe} triplet mapping

  2. Add the name of the class to StorageFormatDescriptor registration file.

MetaStore

MetaStore contains metadata regarding tables, partitions and databases. This is used by Query Processor during plan generation.

...