Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: update with links to DDL doc

...

The assumption is that B has few rows with keys which are skewed in A. So these rows can be loaded into the memory.

Hive Enhancements

Original plan:  The skew data will be obtained from list bucketing (https://cwiki.apache.org/confluence/display/Hive/ListBucketingsee the List Bucketing design document). There are will be no additions to the Hive grammar.

Implementation:  Starting in Hive 0.10.0, tables can be created as skewed or altered to be skewed (in which case partitions created after the ALTER statement will be skewed). In addition, skewed tables can use the list bucketing feature by specifying the STORED AS DIRECTORIES option. See the DDL documentation for details: Create Table, Skewed Tables, and Alter Table Skewed or Stored as Directories.