Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Tipinfo
titleVersion

Parquet is supported by Hive 0.10, 0.11, and 0.12 , by a plugin and natively by 0.13

 

Table of Contents

Summary

Hive 0.10, 0.11, and 0.12

To use Parquet with Hive 0.10-0.12 you must download the Parquet Hive package from the Parquet project. You want the parquet-hive-bundle jar in Maven Central.

...

Native Parquet support is pending for 0.13 via HIVE-5783.

Introduction

Parquet (http://parquet.io/)  is an ecosystem wide columnar format for Hadoop. At the time of this writing it supports:

...

The parquet project has an in-depth description of the format including motivations and diagrams. 

Infotip
titleParquet Motivation

We created Parquet to make the advantages of compressed, efficient columnar data representation available to any project in the Hadoop ecosystem.

Parquet is built from the ground up with complex nested data structures in mind, and uses the record shredding and assembly algorithm described in the Dremel paper. We believe this approach is superior to simple flattening of nested name spaces.

Parquet is built to support very efficient compression and encoding schemes. Multiple projects have demonstrated the performance impact of applying the right compression and encoding scheme to the data. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented.

Parquet is built to be used by anyone. The Hadoop ecosystem is rich with data processing frameworks, and we are not interested in playing favorites. We believe that an efficient, well-implemented columnar storage substrate should be useful to all frameworks without the cost of extensive and difficult to set up dependencies.

Versions

Hive 0.10, 0.11, and 0.

...

12

To use Parquet with Hive 0.10-0.12 you must download the Parquet Hive package from the Parquet project. You want the parquet-hive-bundle jar in Maven Central.

Hive 0.13

Native Parquet support is pending for 0.13 via HIVE-5783. 

Hive QL Syntax

Hive 0.10 - 0.12

...