...
Hive needs to be extended to support the following:
create table
...
Code Block |
---|
create table <T> |
...
(schema) skewed by (keys) on ('c1', 'c2'); |
...
The table will be a skewed table. Skewed information will be created for all partitions.
For eg:
- create table T (c1 string, c2 string) skewed by (c1) on ('x1');
- create table T (c1 string, c2 string, c3 string) skewed by (c1, c2) on (('x1', 'x2'), ('y1', 'y2'));
alter table
Code Block |
---|
alter table <T> (schema) skewed by (keys) on ('c1', 'c2'); |
The above is supported in table level only and not partition level.
It will
- convert a table from a non-skewed table to a skewed table or
- alter a skewed table's skewed column names and/or skewed values.
It won't
- impact partitions created before the alter statement and
- only impact partitions created afterwards.
Code Block |
---|
alter table <T> (schema) not skewed;
{code>
The above will
* turn off "skewed" feature from a table and
* make a table non-skewed.
It won't
* impact partitions created before the alter statement
* only impact partitions created afterwards.
|
alter table <T> (schema) set skewed location (key1="loc1", key2="loc2")
Code Block |
---|
Design
When such a table is being loaded, it would be good to create a sub-directory per skewed key. The infrastructure similar to dynamic partitions can be used.
Alter table <T> partition <P> concatenate; needs to be changed to merge files per directory