Summary of Hive Parquet support
Hive 0.10, 0.11, and 0.12
To use Parquet with Hive 0.10-0.12 you must download the Parquet Hive package from the Parquet project. You want the parquet-hive-bundle jar in Maven Central.
Hive 0.13
Native Parquet support is pending for 0.13 via HIVE-5783.
Introduction to Parquet
Parquet (http://parquet.io/) is an ecosystem wide columnar format for Hadoop. At the time of this writing it supports:
Engines
- Apache Hive
- Apache Drill
- Cloudera Impala
- Apache Crunch
- Apache Pig
- Cascading
Data description
- Apache Avro
- Apache Thrift
- Google Protocol Buffers
For the latest information on Parquet formats and data description, please visit the Parquet-MR projects feature matrix.
File Format
The parquet project has an in-depth description of the format including motivations and diagrams.
Hive QL Syntax
Hive 0.10 - 0.12
CREATE TABLE parquet_test ( id int, str string, mp MAP<STRING,STRING>, lst ARRAY<STRING>, strct STRUCT<A:STRING,B:STRING>) PARTITIONED BY (part string) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';
Hive 0.13
CREATE TABLE parquet_test (
id int,
str string,
mp MAP<STRING,STRING>,
lst ARRAY<STRING>,
strct STRUCT<A:STRING,B:STRING>)
PARTITIONED BY (part string)
STORED AS PARQUET;
Limitations
- Binary, timestamp, date, char, varchar or decimal support are pending (HIVE-6384)
- Create Table AS SELECT (CTAS) and column rename support are pending (HIVE-6375)