Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

In Hive Map-Join is a technique that materializes data for all tables involved in the join except for the largest table and then large table is streamed over the materialized data from small tables. Map-Join is often a good join approach for star-schema joins where the fact table will be streamed over materialized dimension tables.

...

• If LHS and RHS of join predicate are partitioned then for tables from inner side, Partitions can be decided statically at compile time.
• Even if the Big Table columns are not partitioned, the set of values generated from small tables could be pushed down as a predicate on the big table. Storage Handlers like ORC, which can handle predicate push down, could take advantage of this.

Optimization Details

This optimization has compile time and run/execution time pieces to it. Compile time optimizations would happen as part of physical optimizer as one of the last optimizations (before inferring bucket sort). Run/Execution time optimizations would happen as part of MRLocalTask execution and before launching MapRedTask for Map-Join.

...

1. Identify Map Join Operators that can participate in partition pruning.
2. For each of the Map-Join operator in the task, identify columns from big table that can participate in the partition pruning.

    Columns that are identified from big table has following characteristics:
      • They are part of join condition
      • Big table is on the inner side of the join
      • Columns are not involved in any functions in the join conditions
      • Column value is not mutated (no function) before value reaches join condition from Table Scan.
      • Column is a partition column.

3. Identify small tables and columns from small table that can participate in partition pruning.
    Columns that are identified from small table has following characteristics:
      • Column is the other side of predicate in the join condition and Big Table column is identified as a target for partition pruning.
      • Column is not part of any function on the join predicate.
      • Column is part of join in which big table is on the outer side.

4. Modify MapRedLocalTask to assemble set of values for each of the column from small tables that participate in partition pruning and to generate PartitionDesc for big table.

NOTE:
• This requires adding a terminal operator to the operator DAG in the MapRedLocalTask.
• Note that the new terminal operator would get tuples from all small tables of interest (just like HashTableSink Operator).
• Cascading Map-Join operators (joining on different keys in the same task using same big table) would still use the same terminal operator in the MapRedLocalTask.

...

1. As tuples flow in to the new terminal operator in MapRedLocal task, it would extract columns of interest and would add it to a set of values for that column.

2. When close is called on the new terminal operator it would generate partitions of big table by consulting Meta Store (using values generated at #1).
    NOTE:
    • Meta Store would need to answer queries with in clauses. Ex: give me all partitions for Table R where column x in (1,2,3) and column y in (5,6,7).

...

  Implementation of PartitionDescGenSink
  a   a) A map is maintained between BigTable column and HashSet.
    b b) From each tuple extract values corresponding to each column with in set-generation-key.
    c c) Add these to a HashSet
  d   d) On Close of PartitionDescGenSink consult Metadata to get partitions for the key columns corresponding. This requires potential enhancements to Hive Metadata handling to provide an api “Get all partitions where column1 has set1 of values, or column2 has set2 of values.
  e   e) Write the partition info to file. The file name & location needs to be finalized.

...