Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add some links and version information

...

  • Default Value: true
  • Added In: Hive 0.4.0

Whether to enable column pruner.

...

  • Default Value: true
  • Added In: Hive 0.4.0

Whether to enable predicate pushdown.

...

  • Default Value: true
  • Added In: Hive 0.7.0

Whether to push predicates down into storage handlers. Ignored when hive.optimize.ppd is false.

...

  • Default Value: true
  • Added In: Hive 0.8.1

Whether to transitively replicate predicate filters over equijoin conditions.

...

  • Default Value: 1000
  • Added In: Hive 0.2.0

How many rows in the right-most join operand Hive should buffer before
emitting the join result.

...

  • Default Value: 25000
  • Added In: Hive 0.5.0

How many rows in the joining tables (except the streaming table)
should be cached in memory.

...

  • Default Value: 100
  • Added In: Hive 0.5.0

How many values in each keys in the map-joined table should be cached
in memory.

...

  • Default Value: false
  • Added In: Hive 0.6.0

Whether to enable skew join optimization.  (Also see hive.optimize.skewjoin.compiletime.)

hive.skewjoin.key
  • Default Value: 100000
  • Added In: Hive 0.6.0

Determine if we get a skew key in join. If we see more than the specified number of rows with the same key in join operator, we think the key as a skew join key.

...

  • Default Value: 10000
  • Added In: Hive 0.6.0

Determine the number of map task used in the follow up map join job for a skew join. It should be used together with hive.skewjoin.mapjoin.min.split to perform a fine grained control.

...

  • Default Value: 33554432
  • Added In: Hive 0.6.0

Determine the number of map task at most used in the follow up map join job for a skew join by specifying the minimum split size. It should be used together with hive.skewjoin.mapjoin.map.tasks to perform a fine grained control.

...

The main difference between this paramater and hive.optimize.skewjoin is that this parameter uses the skew information stored in the metastore to optimize the plan at compile time itself. If there is no skew information in the metadata, this parameter will not have any effect.
Both hive.optimize.skewjoin.compiletime and hive.optimize.skewjoin should be set to true. (Ideally, hive.optimize.skewjoin should be renamed as hive.optimize.skewjoin.runtime, but for backward compatibility that has not been done.)

If the skew information is correctly stored in the metadata, hive.optimize.skewjoin.compiletime will change the query plan to take care of it, and hive.optimize.skewjoin will be a no-op.

hive.optimize.union.remove

...

Whether to remove the union and push the operators between union and the filesink above union. This avoids an extra scan of the output by union. This is independently useful for union queries, and especially useful when hive.optimize.skewjoin.compiletime is set to true, since an extra union is inserted.

The merge is triggered if either of hive.merge.mapfiles or hive.merge.mapredfiles is set to true. If the user has set hive.merge.mapfiles to true and hive.merge.mapredfiles to false, the idea was that the number of reducers are few, so the number of files anyway is small. However, with this optimization, we are increasing the number of files possibly by a big margin. So, we merge aggresively.

...

  • Default Value: nonstrict
  • Added In: Hive 0.3.0

The mode in which the Hive operations are being performed. In strict mode, some risky queries are not allowed to run.

...

  • Default Value: 100000
  • Added In: Hive 0.2.0

Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity.

...

  • Default Value: false
  • Added In: Hive 0.5.0

When enabled, this option allows a user script to exit successfully without consuming all the data from the standard input.

...

  • Default Value: HIVE_SCRIPT_OPERATOR_ID
  • Added In: Hive 0.5.0

Name of the environment variable that holds the unique script operator ID in the user's transform function (the custom mapper/reducer that the user has specified in the query).

...

  • Default Value: false
  • Added In: Hive 0.2.0

This controls whether the final outputs of a query (to a local/hdfs file or a Hive table) is compressed. The compression codec and other options are determined from Hadoop configuration variables mapred.output.compress* .

...

  • Default Value: false
  • Added In: Hive 0.2.0

This controls whether intermediate files produced by Hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from Hadoop configuration variables mapred.output.compress*.

...

  • Default Value: false
  • Added In: Hive 0.5.0

Whether to execute jobs in parallel.

hive.exec.parallel.thread.number
  • Default Value: 8
  • Added In: Hive 0.6.0

How many jobs at most can be executed in parallel.

...

  • Default Value: false
  • Added In: Hive 0.8.0

Whether to provide the row offset virtual column.

...