Page History

...

Default Value: true
Added In: Hive 0.4.0

Whether to enable column pruner.

...

Default Value: true
Added In: Hive 0.4.0

Whether to enable predicate pushdown.

...

Default Value: true
Added In: Hive 0.7.0

Whether to push predicates down into storage handlers. Ignored when hive.optimize.ppd is false.

...

Default Value: true
Added In: Hive 0.8.1

Whether to transitively replicate predicate filters over equijoin conditions.

...

Default Value: 1000
Added In: Hive 0.2.0

How many rows in the right-most join operand Hive should buffer before
emitting the join result.

...

Default Value: 25000
Added In: Hive 0.5.0

How many rows in the joining tables (except the streaming table)
should be cached in memory.

...

Default Value: 100
Added In: Hive 0.5.0

How many values in each keys in the map-joined table should be cached
in memory.

...

Default Value: false
Added In: Hive 0.6.0

Whether to enable skew join optimization. (Also see hive.optimize.skewjoin.compiletime.)

hive.skewjoin.key

Default Value: 100000
Added In: Hive 0.6.0

Determine if we get a skew key in join. If we see more than the specified number of rows with the same key in join operator, we think the key as a skew join key.

...

Default Value: 10000
Added In: Hive 0.6.0

Determine the number of map task used in the follow up map join job for a skew join. It should be used together with hive.skewjoin.mapjoin.min.split to perform a fine grained control.

...

Default Value: 33554432
Added In: Hive 0.6.0

Determine the number of map task at most used in the follow up map join job for a skew join by specifying the minimum split size. It should be used together with hive.skewjoin.mapjoin.map.tasks to perform a fine grained control.

...

The main difference between this paramater and hive.optimize.skewjoin is that this parameter uses the skew information stored in the metastore to optimize the plan at compile time itself. If there is no skew information in the metadata, this parameter will not have any effect.
Both hive.optimize.skewjoin.compiletime and hive.optimize.skewjoin should be set to true. (Ideally, hive.optimize.skewjoin should be renamed as hive.optimize.skewjoin.runtime, but for backward compatibility that has not been done.)

If the skew information is correctly stored in the metadata, hive.optimize.skewjoin.compiletime will change the query plan to take care of it, and hive.optimize.skewjoin will be a no-op.

hive.optimize.union.remove

...

Whether to remove the union and push the operators between union and the filesink above union. This avoids an extra scan of the output by union. This is independently useful for union queries, and especially useful when hive.optimize.skewjoin.compiletime is set to true, since an extra union is inserted.

The merge is triggered if either of hive.merge.mapfiles or hive.merge.mapredfiles is set to true. If the user has set hive.merge.mapfiles to true and hive.merge.mapredfiles to false, the idea was that the number of reducers are few, so the number of files anyway is small. However, with this optimization, we are increasing the number of files possibly by a big margin. So, we merge aggresively.

...

Default Value: nonstrict
Added In: Hive 0.3.0

The mode in which the Hive operations are being performed. In strict mode, some risky queries are not allowed to run.

...

Default Value: 100000
Added In: Hive 0.2.0

Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity.

...

Default Value: false
Added In: Hive 0.5.0

When enabled, this option allows a user script to exit successfully without consuming all the data from the standard input.

...

Default Value: HIVE_SCRIPT_OPERATOR_ID
Added In: Hive 0.5.0

Name of the environment variable that holds the unique script operator ID in the user's transform function (the custom mapper/reducer that the user has specified in the query).

...

Default Value: false
Added In: Hive 0.2.0

This controls whether the final outputs of a query (to a local/hdfs file or a Hive table) is compressed. The compression codec and other options are determined from Hadoop configuration variables mapred.output.compress* .

...

Default Value: false
Added In: Hive 0.2.0

This controls whether intermediate files produced by Hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from Hadoop configuration variables mapred.output.compress*.

...

Default Value: false
Added In: Hive 0.5.0

Whether to execute jobs in parallel.

hive.exec.parallel.thread.number

Default Value: 8
Added In: Hive 0.6.0

How many jobs at most can be executed in parallel.

...

Default Value: false
Added In: Hive 0.8.0

Whether to provide the row offset virtual column.

...

Space shortcuts

Child pages

Versions Compared

Old Version 72

New Version 73

Key

hive.skewjoin.key

hive.optimize.union.remove

hive.exec.parallel.thread.number