You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Storage Formats

SerDes and Storage Formats

HCatalog uses Hive's SerDe class to serialize and deserialize data. SerDes are provided for RCFile, CSV text, JSON text, SequenceFile and ORC formats. Check the Hive documentation for additional SerDes that might be included in new versions. For example, the Avro SerDe was added in Hive 0.9.1 and the ORC file format was added in Hive 0.11.0.

Users can write SerDes for custom formats using instructions in the Hive SerDe documentation:

For information about how to create a table with a custom or native SerDe, see Row Format, Storage Format, and SerDe.

Usage from Hive

Hive and HCatalog (version 0.4 and later) share the same storage abstractions, and thus, you can read from and write to HCatalog tables from within Hive, and vice versa.

However, for HCatalog versions 0.4 and 0.5 Hive does not know where to find the HCatalog jar by default, so if you use any features that have been introduced by HCatalog, such as a table using the JSON SerDe, you might get a "class not found" exception. In this situation, before you run Hive, set environment variable HIVE_AUX_JARS_PATH to the directory with your HCatalog jar. (If the examples in the Installation document were followed, that should be /usr/local/hcat/share/hcatalog/.)

After version 0.5, HCatalog is part of the Hive distribution and you do not have to add the HCatalog jar to HIVE_AUX_JARS_PATH.

CTAS Issue with JSON SerDe

Using the Hive CREATE TABLE ... AS SELECT command with a JSON SerDe results in a table that has column headers such as "_col0", which can be read by HCatalog or Hive but cannot be easily read by external users. To avoid this issue, create the table in two steps instead of using CTAS:

  1. CREATE TABLE ...
  2. INSERT OVERWRITE TABLE ... SELECT ...

See HCATALOG-436 for details.


Navigation Links

Previous: Command Line Interface
Next: Dynamic Partitioning

SerDe general information: Hive SerDe
SerDe details: SerDe
SerDe DDL: Row Format, Storage Format, and SerDe

General: HCatalog ManualWebHCat ManualHive Wiki HomeHive Project Site
Old version of this document (HCatalog 0.5.0): Storage Formats

  • No labels