Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add hive.optimize.union.remove & hive.mapred.supports.subdirectories (HIVE-3276) and hive.optimize.skewjoin.compiletime (continued)

...

hive.optimize.skewjoin.compiletime
  • Default Value: fe
  • Added In: Hive 0.xxxxxxxxxxxx

The

  • false
  • Added In: Hive 0.10.0

Whether to create a separate plan for skewed keys for the tables in the join. This is based on the skewed keys stored in the metadata. At compile time, the plan is broken into different joins: one for the skewed keys, and the other for the remaining keys. And then, a union is performed for the two joins generated above. So unless the same skewed key is present in both the joined tables, the join for the skewed key will be performed as a map-side join.

The main difference between this paramater and hive.optimize.skewjoin is that this parameter uses the skew information stored in the metastore to optimize the plan at compile time itself. If there is no skew information in the metadata, this parameter will not have any effect.
Both hive.optimize.skewjoin.compiletime and hive.optimize.skewjoin should be set to true. (Ideally, hive.optimize.skewjoin should be renamed as hive.optimize.skewjoin.runtime, but for backward compatibility that has not been done.)

If the skew information is correctly stored in the metadata, hive.optimize.skewjoin.compiletime will change the query plan to take care of it, and hive.optimize.skewjoin will be a no-op. 

hive.optimize.union.remove

...

Whether to remove the union and push the operators between union and the filesink above union. This avoids an extra scan of the output by union. This is independently useful for union queries, and especially useful when hive.optimize.skewjoin.compiletime is set to true, since an

...

an extra union is inserted.

The merge is triggered if either of hive.merge.mapfiles or hive.merge.mapredfiles is set to true.
If  If the user has set hive.merge.mapfiles to true and hive.merge.mapredfiles to false, the idea was the
number that the number of reducers are few, so the number of files anyway are is small. However, with this optimization,
we  we are increasing the number of files possibly by a big margin. So, we merge aggresively.
</property>

hive.mapred.supports.subdirectories
  • Default Value: false
  • Added In: Hive 0.10.0 with HIVE-3276

The

 

...

Whether the version of Hadoop which is running supports sub-directories for tables/partitions. Many Hive optimizations can be applied if the Hadoop version supports sub-directories for tables/partitions. This support was added by MAPREDUCE-1501.

hive.mapred.mode
  • Default Value: nonstrict
  • Added In:

...