Page History

...

Tip

title	Parquet Motivation

We created Parquet to make the advantages of compressed, efficient columnar data representation available to any project in the Hadoop ecosystem.

Parquet is built from the ground up with complex nested data structures in mind, and uses the record shredding and assembly algorithm described in the Dremel paper. We believe this approach is superior to simple flattening of nested name spaces.

Parquet is built to support very efficient compression and encoding schemes. Multiple projects have demonstrated the performance impact of applying the right compression and encoding scheme to the data. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented.

Parquet is built to be used by anyone. The Hadoop ecosystem is rich with data processing frameworks, and we are not interested in playing favorites. We believe that an efficient, well-implemented columnar storage substrate should be useful to all frameworks without the cost of extensive and difficult to set up dependencies.

...

Native Parquet Support

Hive 0.10, 0.11, and 0.12

...

Native Parquet support was added to Hive 0.13 via (HIVE-5783). Please note that not all Parquet data types are supported yet. Support for the remaining data types is being added through HIVE-6384.

in this version (see Versions and Limitations below).

HiveQL Hive QL Syntax

A CREATE TABLE statement can specify the Parquet storage format with syntax that depends on the Hive version.

...

No Format
CREATE TABLE parquet_test ( id int, str string, mp MAP<STRING,STRING>, lst ARRAY<STRING>, strct STRUCT<A:STRING,B:STRING>) PARTITIONED BY (part string) STORED AS PARQUET;

Versions and Limitations

...

Hive 0.13.0

Support was added for Create Table AS SELECT (CTAS -- HIVE-6375).

Hive 0.14.0

Support was added for

...

timestamp (HIVE-6394), decimal (HIVE-6367), and char and varchar (HIVE-7735)

...

data types. Support was also added for column rename with use of the flag parquet.column.index.access

...

(HIVE-6938

...

).

...

Parquet column names were previously case sensitive (query

...

had to use column case that matches exactly what

...

was in the metastore), but became case insensitive

...

(HIVE-7554).

Hive 1.1.0

Support was added for binary data types (HIVE-7073).

Hive 1.2.0

Support for remaining Parquet data types was added (HIVE-6384).

Space shortcuts

Child pages

Versions Compared

Old Version 26

New Version Current

Key