Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add hive.auto.convert.join.use.nonstaged (HIVE-6144)

...

No Format
set hive.auto.convert.join.noconditionaltask = true;
set hive.auto.convert.join.noconditionaltask.size = 1000010000000;

The default for hive.auto.convert.join.noconditionaltask is true which means auto conversion is enabled. (Originally the default was false – see HIVE-3784 – but it was changed to true by HIVE-4146 before Hive 0.11.0 was released.)

The size configuration enables the user to control what size table can fit in memory. This value represents the sum of the sizes of tables that can be converted to hashmaps that fit in memory. Currently, n-1 tables of the join have to fit in memory for the map-join optimization to take effect. There is no check to see if the table is a compressed one or not and what the potential size of the table can be. The effect of this assumption on the results is discussed in the next section.

...

Auto join conversion also affects the sort-merge-bucket joins.

Info
titleVersion 0.13.0 and later

Hive 0.13.0 introduced hive.auto.convert.join.use.nonstaged with a default of false (HIVE-6144).

For conditional joins, if the input stream from a small alias can be directly applied to the join operator without filtering or projection, then it does not need to be pre-staged in the distributed cache via a MapReduce local task. Setting hive.auto.convert.join.use.nonstaged to true will avoid pre-staging in those cases.

 

Current Optimization
  1. Group as many MJ operators as possible into one MJ.

...