Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Hive needs to be extended to support the following:

create table

...

Code Block

create table <T> 

...

(schema) skewed by (keys) on ('c1', 'c2');

...

The table will be a skewed table. Skewed information will be created for all partitions.

For eg:

  • create table T (c1 string, c2 string) skewed by (c1) on ('x1');
  • create table T (c1 string, c2 string, c3 string) skewed by (c1, c2) on (('x1', 'x2'), ('y1', 'y2'));

alter table

Code Block

alter table <T> (schema) skewed by  (keys) on ('c1', 'c2');

The above is supported in table level only and not partition level.

It will

  • convert a table from a non-skewed table to a skewed table or
  • alter a skewed table's skewed column names and/or skewed values.

It won't

  • impact partitions created before the alter statement and
  • only impact partitions created afterwards.
Code Block

alter table <T> (schema) not skewed;
{code>

The above will 
* turn off "skewed" feature from a table and 
* make a table non-skewed. 

It won't 
* impact partitions created before the alter statement
* only impact partitions created afterwards.

alter table <T> (schema) set skewed location (key1="loc1", key2="loc2")

Code Block

Design

When such a table is being loaded, it would be good to create a sub-directory per skewed key. The infrastructure similar to dynamic partitions can be used.
Alter table <T> partition <P> concatenate; needs to be changed to merge files per directory