Storage Formats
SerDes and Storage Formats
HCatalog uses Hive's SerDe class to serialize and deserialize data. SerDes are provided for RCFile, CSV text, JSON text, SequenceFile and ORC formats. Check the Hive documentation for additional SerDes that might be included in new versions. For example, the Avro SerDe was added in Hive 0.9.1 and the ORC file format was added in Hive 0.11.0.
Users can write SerDes for custom formats using instructions in the Hive SerDe documentation:
- Hive SerDe in the Developer Guide
- SerDe - how to add a new SerDe in the Developer Guide
- also see SerDe for details about input and output processing
For information about how to create a table with a custom or native SerDe, see Row Format, Storage Format, and SerDe.
Usage from Hive
Hive and HCatalog (version 0.4 and later) share the same storage abstractions, and thus, you can read from and write to HCatalog tables from within Hive, and vice versa.
However, for HCatalog versions 0.4 and 0.5 Hive does not know where to find the HCatalog jar by default, so if you use any features that have been introduced by HCatalog, such as a table using the JSON SerDe, you might get a "class not found" exception. In this situation, before you run Hive, set environment variable HIVE_AUX_JARS_PATH
to the directory with your HCatalog jar. (If the examples in the Installation document were followed, that should be /usr/local/hcat/share/hcatalog/
.)
After version 0.5, HCatalog is part of the Hive distribution and you do not have to add the HCatalog jar to HIVE_AUX_JARS_PATH
.
CTAS Issue with JSON SerDe
Using the Hive CREATE TABLE ... AS SELECT command with a JSON SerDe results in a table that has column headers such as "_col0
", which can be read by HCatalog or Hive but cannot be easily read by external users. To avoid this issue, create the table in two steps instead of using CTAS:
- CREATE TABLE ...
- INSERT OVERWRITE TABLE ... SELECT ...
See HCATALOG-436 for details.
Previous: Command Line Interface
Next: Dynamic Partitioning
SerDe general information: Hive SerDe
SerDe details: SerDe
SerDe DDL: Row Format, Storage Format, and SerDe
General: HCatalog Manual – WebHCat Manual – Hive Wiki Home – Hive Project Site
Old version of this document (HCatalog 0.5.0): Storage Formats