Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Table of Contents

Overview

In Hive, Map-Join is a technique that materializes data for all tables involved in the join except for the largest table and then large table is streamed over the materialized data from small tables. Map-Join is often a good join approach for star-schema joins where the fact table will be streamed over materialized dimension tables.

Problem

Map-Join predicates where the joining columns from big table (streamed table) are partition columns and corresponding columns from small table is not partitioned, the join would not prune the unnecessary partitions from big table. Since data for all small tables is materialized before big table is streamed, theoretically it would be possible to prune the unnecessary partitions from big table.

HIVE-5119 has been created to track this feature improvement.

Proposed Solution

Figure out the set of values from all small tables for each join column from big table (that is partition key). Using these set of values figure out the partitions from big table that should be scanned using metadata. Change the partitions to be scanned for big table before Map-Join starts streaming big table. This feature would be turned on only through an explicit configuration (name of that configuration is TBD).

Possible Extensions

• If LHS and RHS of join predicate are partitioned then for tables from inner side, Partitions can be decided statically at compile time.
• Even if the Big Table columns are not partitioned, the set of values generated from small tables could be pushed down as a predicate on the big table. Storage Handlers like ORC, which can handle predicate push down, could take advantage of this.

Optimization Details

This optimization has compile time and run/execution time pieces to it. Compile time optimizations would happen as part of physical optimizer as one of the last optimizations (before inferring bucket sort). Run/Execution time optimizations would happen as part of MRLocalTask execution and before launching MapRedTask for Map-Join.

Compile Time

1. Identify Map Join Operators that can participate in partition pruning.
2. For each of the Map-Join operator in the task, identify columns from big table that can participate in the partition pruning.

...

NOTE:
• This requires adding a terminal operator to the operator DAG in the MapRedLocalTask.
• Note that the new terminal operator would get tuples from all small tables of interest (just like HashTableSink Operator).
• Cascading Map-Join operators (joining on different keys in the same task using same big table) would still use the same terminal operator in the MapRedLocalTask.

Runtime

1. As tuples flow in to the new terminal operator in MapRedLocal task, it would extract columns of interest and would add it to a set of values for that column.

...

Assumptions:
• In HIVE currently Join predicates can only include conjunctions.
• Hive only supports Equijoin

Pseudo Code

1. Walk through Task DAG looking for MapredTask. Perform #2 - #6 for each such MapRedTask.
2. Skip Task if it contains backup join plan (i.e if not MAPJOIN_ONLY_NOBACKUP or if backupTask is not null).
    NOTE:
    This is aggressive; in my limited exposure to the hive code, it seemed like conditional tasks are currently set only for joins.

...