Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: update External Tables with HIVE-6109

Dynamic Partitioning

Table of Contents

Overview

When writing data in HCatalog it is possible to write all records to a single partition. In this case the partition column(s) need not be in the output data.

The following Pig script illustrates this:

No Format

A = load 'raw' using HCatLoader(); 
... 
split Z into for_us if region='us', for_eu if region='eu', for_asia if region='asia'; 
store for_us into 'processed' using HCatStorer("ds=20110110, region=us"); 
store for_eu into 'processed' using HCatStorer("ds=20110110, region=eu"); 
store for_asia into 'processed' using HCatStorer("ds=20110110, region=asia"); 

In cases where you want to write data to multiple partitions simultaneously, this can be done by placing partition columns in the data and not specifying partition values when storing the data.

No Format

A = load 'raw' using HCatLoader(); 
... 
store Z into 'processed' using HCatStorer(); 

...

Starting in HCatalog 0.5, dynamic partitioning on external tables was broken (HCATALOG-500). This issue was fixed in Hive 0.12.0 0 by creating dynamic partitions of external tables in locations based on metadata rather than user specifications (HIVE-5011). In a future release, users will be able to customize the locations (see HIVE-6109).

Static partitions for external tables can have user-specified locations.

Hive Dynamic Partitions

Information about Hive dynamic partitions is available here:

...

So this statement...

No Format

store A into 'mytable' using HCatStorer("a=1, b=1");

...is equivalent to any of the following statements, if the data has only values where a=1 and b=1:

No Format

store A into 'mytable' using HCatStorer();
No Format

store A into 'mytable' using HCatStorer("a=1");
No Format

store A into 'mytable' using HCatStorer("b=1");

...

For example, let's say a=1 for all values across our dataset and b takes the values 1 and 2. Then the following statement...

No Format

store A into 'mytable' using HCatStorer();

...is equivalent to either of these statements:

No Format

store A into 'mytable' using HCatStorer("a=1");
No Format

split A into A1 if b='1', A2 if b='2';
store A1 into 'mytable' using HCatStorer("a=1, b=1");
store A2 into 'mytable' using HCatStorer("a=1, b=2");

...

A current code example for writing out a specific partition for (a=1, b=1) would go something like this:

No Format

Map<String, String> partitionValues = new HashMap<String, String>();
partitionValues.put("a", "1");
partitionValues.put("b", "1");
HCatTableInfo info = HCatTableInfo.getOutputTableInfo(dbName, tblName, partitionValues);
HCatOutputFormat.setOutput(job, info);

...

With dynamic partitioning, we simply specify only as many keys as we know about, or as required. It will figure out the rest of the keys by itself and spray out necessary partitions, being able to create multiple partitions with a single job.

 

Panel
titleColorindigo
titleBGColorsilver
titleNavigation Links

Previous: Storage Formats
Next: Notification

Hive design document: Dynamic Partitions
Hive tutorial: Dynamic-Partition Insert
Hive DML: Dynamic Partition Inserts

General: HCatalog ManualWebHCat ManualHive Wiki HomeHive Project Site
Old version of this document (HCatalog 0.5.0): Dynamic Partitioning