Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add info from SerDe doc about Amazon's JSON SerDe; minor edits

...

Note that org.apache.hadoop.hive.serde is the deprecated old serde library. Please look at org.apache.hadoop.hive.serde2 for the latest version.

Hive currently use uses these FileFormat classes to read and write HDFS files:

  • !TextInputFormat/HiveIgnoreKeyTextOutputFormat: These 2 classes read/write data in plain text file format.
  • !SequenceFileInputFormat/SequenceFileOutputFormat: These 2 classes read/write data in hadoop !SequenceFile format.

Hive currently use uses these !SerDe classes to serialize and deserialize data:

  • !MetadataTypedColumnsetSerDe: This !SerDe is used to read/write delimited records like CSV, tab-separated control-A separated records (sorry, quote is not supported yet.)
  • !ThriftSerDe: This !SerDe is used to read/write thrift serialized objects. The class file for the Thrift object must be loaded first.
  • !DynamicSerDe: This !SerDe also read/write thrift serialized objects, but it understands thrift DDL so the schema of the object can be provided at runtime. Also it supports a lot of different protocols, including !TBinaryProtocol, !TJSONProtocol, TCTL!SeparatedProtocol (which writes data in delimited records).

Also:

  • For JSON files, an Amazon SerDe is available at s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jar.
  • An Avro SerDe was added in Hive 0.9.1

...

  • .
  • A SerDe for the ORC file format was added in Hive 0.11.0.

See SerDe for detailed information about input and output processing. Also see Storage Formats in the HCatalog manual, including CTAS Issue with JSON SerDe.

...