Parquet

Summary of Hive Parquet support

Hive 0.10, 0.11, and 0.12

To use Parquet with Hive 0.10-0.12 you must download the Parquet Hive package from the Parquet project. You want the parquet-hive-bundle jar in Maven Central.

Hive 0.13

Native Parquet support is pending for 0.13 via HIVE-5783.

Introduction to Parquet

Parquet (http://parquet.io/) is an ecosystem wide columnar format for Hadoop. At the time of this writing it supports:

Engines

Apache Hive
Apache Drill
Cloudera Impala
Apache Crunch
Apache Pig
Cascading

Data description

Apache Avro
Apache Thrift
Google Protocol Buffers

For the latest information on Parquet formats and data description, please visit the Parquet-MR projects feature matrix.

File Format

The parquet project has an in-depth description of the format including motivations and diagrams.

Hive QL Syntax

Hive 0.10 - 0.12

CREATE TABLE parquet_test (
 id int,
 str string,
 mp MAP<STRING,STRING>,
 lst ARRAY<STRING>,
 strct STRUCT<A:STRING,B:STRING>) 
PARTITIONED BY (part string)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
 STORED AS
 INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
 OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';

Hive 0.13

CREATE TABLE parquet_test (
 id int,
 str string,
 mp MAP<STRING,STRING>,
 lst ARRAY<STRING>,
 strct STRUCT<A:STRING,B:STRING>) 
PARTITIONED BY (part string)
STORED AS PARQUET;

Limitations

Binary, timestamp, date, char, varchar or decimal support are pending (HIVE-6384)
Create Table AS SELECT (CTAS) and column rename support are pending (HIVE-6375)

Space shortcuts

Child pages

Summary of Hive Parquet support

Hive 0.10, 0.11, and 0.12

Hive 0.13

Introduction to Parquet

Engines

Data description

File Format

Hive QL Syntax

Hive 0.10 - 0.12

Hive 0.13

Limitations