Page History

...

For local mode, memory of the mappers/reducers.

hive.

...

Default Value: 0.3
Added In: Hive 0.7.0

...

map

...

hive.map.aggr.hash.force.flush.memory.threshold

...

How many values in each keys in the map-joined table should be cached
in memory.

hive

...

.mapjoin.followby.map.aggr.hash.percentmemory

Default Value:

...

0.3
Added In: Hive 0.

...

7.0

Whether to enable skew join optimization. (Also see hive.optimize.skewjoin.compiletime.)

...

Portion of total memory to be used by map-side group aggregation hash table, when this group by is followed by map join.

hive.smalltable.filesize
hive.mapjoin.smalltable.filesize

Default Value:

...

25000000
Added In: Hive 0.

...

Determine if we get a skew key in join. If we see more than the specified number of rows with the same key in join operator, we think the key as a skew join key.

hive.skewjoin.mapjoin.map.tasks

...

7.0 with HIVE-1642: hive.smalltable.filesize (replaced by hive.mapjoin.smalltable.filesize in Hive 0.8.1)
Added In: Hive 0.8.1 with HIVE-2499: hive.mapjoin.smalltable.filesize

The threshold for the input file size of the small tables; if the file size is smaller than this threshold, it will try to convert the common join into map join.

hive.mapjoin.localtask.max.memory.usage

Default Value: 0.90
Added In: Hive 0.

...

7.0

...

Determine the number of map task used in the follow up map join job for a skew join. It should be used together with hive.skewjoin.mapjoin.min.split to perform a fine grained control.

hive.skewjoin.mapjoin.min.split

...

with HIVE-1808 and HIVE-1642

This number means how much memory the local task can take to hold the key/value into in-memory hash table; If the local task's memory usage is more than this number, the local task will be aborted. It means the data of small table is too large to be held in memory.

hive.mapjoin.followby.gby.localtask.max.memory.usage

Default Value: 0.55
Added In: Hive 0.

...

7.0

...

This number means how much memory the local task can take to hold the key/value into in-memory hash table when this map join followed by a group by; If the local task's memory usage is more than this number, the local task will be aborted. It means the data of small table is too large to be held in the memory.

hive.mapjoin.check.memory.rows

Default Value: 100000
Added In: Hive 0.7.0 with HIVE-1808 and HIVE-1642

The number means after how many rows processed it needs to check the memory usage.

hive.optimize.skewjoin

...

Default Value:
...
`false`
Added In:
...
Hive 0.
...
6.0
Whether to
...
enable skew join optimization. (Also see hive.optimize.skewjoin.compiletime.)

hive.skewjoin.key

Default Value: 100000
Added In: Hive 0.6.0

Determine if we get a skew key in join. If we see more than the specified number of rows with the same key in join operator, we think the key as a skew join key.

hive.skewjoin.mapjoin.map.tasks

Default Value: 10000
Added In: Hive 0.6.0

Determine the number of map task used in the follow up map join job for a skew join. It should be used together with hive.skewjoin.mapjoin.min.split to perform a fine grained control.

hive.skewjoin.mapjoin.min.split

Default Value: 33554432
Added In: Hive 0.6.0

Determine the number of map task at most used in the follow up map join job for a skew join by specifying the minimum split size. It should be used together with hive.skewjoin.mapjoin.map.tasks to perform a fine grained control.

hive.optimize.skewjoin.compiletime

The main difference between this paramater and hive.optimize.skewjoin is that this parameter uses the skew information stored in the metastore to optimize the plan at compile time itself. If there is no skew information in the metadata, this parameter will not have any effect.
Both hive.optimize.skewjoin.compiletime and hive.optimize.skewjoin should be set to true. (Ideally, hive.optimize.skewjoin should be renamed as hive.optimize.skewjoin.runtime, but for backward compatibility that has not been done.)

If the skew information is correctly stored in the metadata, hive.optimize.skewjoin.compiletime will change the query plan to take care of it, and hive.optimize.skewjoin will be a no-op.

...

Default Value: `false`
Added In: Hive Hive 0.10.0 with HIVE-3276
Whether to remove the union and push the operators between union and the filesink above union. This avoids an extra scan of the output by union. This is independently useful for union queries, and especially useful when hive.optimize.skewjoin.compiletime is set to true, since an extra union is inserted.
The merge is triggered if either of hive.merge.mapfiles or hive.merge.mapredfiles is set to true. If the user has set hive.merge.mapfiles to true and hive.merge.mapredfiles to false, the idea was that the number of reducers are few, so the number of files anyway is small. However, with this optimization, we are increasing the number of files possibly by a big margin. So, we merge aggresively.

hive.mapred.supports.subdirectories

Default Value: false
Added In: Hive 0.10.0 with HIVE-3276

Whether the version of Hadoop which is running supports sub-directories for tables/partitions. Many Hive optimizations can be applied if the Hadoop version supports sub-directories for tables/partitions. This support was added by MAPREDUCE-1501.

hive.mapred.mode

Default Value: nonstrict
Added In: Hive 0.3.0

The mode in which the Hive operations are being performed. In strict mode, some risky queries are not allowed to run.

hive.exec.script.maxerrsize

Default Value: 100000
Added In: Hive 0.2.0

Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity.

hive.exec.script.allow.partial.consumption

Default Value: false
Added In: Hive 0.5.0

When enabled, this option allows a user script to exit successfully without consuming all the data from the standard input.

hive.script.operator.id.env.var

Default Value: HIVE_SCRIPT_OPERATOR_ID
Added In: Hive 0.5.0

Name of the environment variable that holds the unique script operator ID in the user's transform function (the custom mapper/reducer that the user has specified in the query).

hive.exec.compress.output

...

create a separate plan for skewed keys for the tables in the join. This is based on the skewed keys stored in the metadata. At compile time, the plan is broken into different joins: one for the skewed keys, and the other for the remaining keys. And then, a union is performed for the two joins generated above. So unless the same skewed key is present in both the joined tables, the join for the skewed key will be performed as a map-side join.

The main difference between this paramater and hive.optimize.skewjoin is that this parameter uses the skew information stored in the metastore to optimize the plan at compile time itself. If there is no skew information in the metadata, this parameter will not have any effect.
Both hive.optimize.skewjoin.compiletime and hive.optimize.skewjoin should be set to true. (Ideally, hive.optimize.skewjoin should be renamed as hive.optimize.skewjoin.runtime, but for backward compatibility that has not been done.)

If the skew information is correctly stored in the metadata, hive.optimize.skewjoin.compiletime will change the query plan to take care of it, and hive.optimize.skewjoin will be a no-op.

hive.optimize.union.remove

Default Value: false
Added In: Hive 0.

...

10.0 with HIVE-3276

This controls whether the final outputs of a query (to a local/hdfs file or a Hive table) is compressed. The compression codec and other options are determined from Hadoop configuration variables mapred.output.compress* .

hive.exec.compress.intermediate

Default Value: false
Added In: Hive 0.2.0

This controls whether intermediate files produced by Hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from Hadoop configuration variables mapred.output.compress*.

hive.exec.parallel

Whether to remove the union and push the operators between union and the filesink above union. This avoids an extra scan of the output by union. This is independently useful for union queries, and especially useful when hive.optimize.skewjoin.compiletime is set to true, since an extra union is inserted.

The merge is triggered if either of hive.merge.mapfiles or hive.merge.mapredfiles is set to true. If the user has set hive.merge.mapfiles to true and hive.merge.mapredfiles to false, the idea was that the number of reducers are few, so the number of files anyway is small. However, with this optimization, we are increasing the number of files possibly by a big margin. So, we merge aggresively.

hive.mapred.supports.subdirectories

Default Value: false
Added In: Hive 0.10.0 with HIVE-3276

Whether the version of Hadoop which is running supports sub-directories for tables/partitions. Many Hive optimizations can be applied if the Hadoop version supports sub-directories for tables/partitions. This support was added by MAPREDUCE-1501.

hive.mapred.mode

Default Value: nonstrictDefault Value: false
Added In: Hive 0.53.0

Whether to execute jobs in parallelThe mode in which the Hive operations are being performed. In strict mode, some risky queries are not allowed to run.

hive.exec.

...

script.

...

maxerrsize

Default Value: 8 100000
Added In: Hive 0.6.02.0

Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacityHow many jobs at most can be executed in parallel.

hive.exec

...

.script.allow.partial.consumption

Default Value: false
Added In: Hive 0.85.0

Whether to provide the row offset virtual columnWhen enabled, this option allows a user script to exit successfully without consuming all the data from the standard input.

hive.

...

script.operator.id.env.var

Default Value: false HIVE_SCRIPT_OPERATOR_ID
Added In: Hive 0.5.0
Removed in: Hive 0.13.0 with HIVE-4518

Whether Hive should periodically update task progress counters during execution. Enabling this allows task progress to be monitored more closely in the job tracker, but may impose a performance penalty. This flag is automatically set to true for jobs with hive.exec.dynamic.partition set to true.

hive.counters.group.name

Default Value: HIVE
Added In: Hive 0.13.0 with HIVE-4518

...

Name of the environment variable that holds the unique script operator ID in the user's transform function (the custom mapper/reducer that the user has specified in the query).

hive.exec.compress.output

Default Value: false
Added In: Hive 0.2.0

This controls whether the final outputs of a query (to a local/hdfs file or a Hive table) is compressed. The compression codec and other options are determined from Hadoop configuration variables mapred.output.compress* .

hive.exec.

...

compress.

...

intermediate

Default Value: (empty) false
Added In: Hive 0.42.0

Comma-separated list of pre-execution hooks to be invoked for each statement. A pre-execution hook is specified as the name of a Java class which implements the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface.This controls whether intermediate files produced by Hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from Hadoop configuration variables mapred.output.compress*.

hive.exec.

...

parallel

Default Value: (empty) false
Added In: Hive 0.5.0

Comma-separated list of post-execution hooks to be invoked for each statement. A post-execution hook is specified as the name of a Java class which implements the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface.Whether to execute jobs in parallel.

hive.exec.parallel.

...

thread.

...

number

Default Value: (empty)
Added In: Hive 0.8.0

Comma-separated list of on-failure hooks to be invoked for each statement. An on-failure hook is specified as the name of Java class which implements the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface.

hive.merge.mapfiles

Default Value: true
Added In:

Merge small files at the end of a map-only job.

...

8
Added In: Hive 0.6.0

How many jobs at most can be executed in parallel.

hive.exec.rowoffset

Default Value: false
Added In: Hive 0.8.0

Whether to provide the row offset virtual column.

hive.task.progress

Default Value: false
Added In: Hive 0.5.0
Removed in:

Merge small files at the end of a map-reduce job.

hive.mergejob.maponly

Default Value: true
Added In:

Try to generate a map-only job for merging files if CombineHiveInputFormat is supported.

hive.merge.size.per.task

Default Value: 256000000
Added In:

Size of merged files at the end of the job.

hive.merge.smallfiles.avgsize

Default Value: 16000000
Added In:

When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true.

hive.mapjoin.smalltable.filesize

Default Value: 25000000
Added In:

The threshold for the input file size of the small tables; if the file size is smaller than this threshold, it will try to convert the common join into map join.

hive.mapjoin.localtask.max.memory.usage

Default Value: 0.90
Added In:

This number means how much memory the local task can take to hold the key/value into in-memory hash table; If the local task's memory usage is more than this number, the local task will be aborted. It means the data of small table is too large to be held in memory.

hive.mapjoin.followby.gby.localtask.max.memory.usage

Default Value: 0.55
Added In:

This number means how much memory the local task can take to hold the key/value into in-memory hash table when this map join followed by a group by; If the local task's memory usage is more than this number, the local task will be aborted. It means the data of small table is too large to be held in the memory.

hive.mapjoin.check.memory.rows

Default Value: 100000
Added In:

Hive 0.13.0 with HIVE-4518

Whether Hive should periodically update task progress counters during execution. Enabling this allows task progress to be monitored more closely in the job tracker, but may impose a performance penalty. This flag is automatically set to true for jobs with hive.exec.dynamic.partition set to true.

hive.counters.group.name

Default Value: HIVE
Added In: Hive 0.13.0 with HIVE-4518

Counter group name for counters used during query execution. The counter group is used for internal Hive variables (CREATED_FILE, FATAL_ERROR, and so on).

hive.exec.pre.hooks

Default Value: (empty)
Added In: Hive 0.4.0

Comma-separated list of pre-execution hooks to be invoked for each statement. A pre-execution hook is specified as the name of a Java class which implements the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface.

hive.exec.post.hooks

Default Value: (empty)
Added In: Hive 0.5.0

Comma-separated list of post-execution hooks to be invoked for each statement. A post-execution hook is specified as the name of a Java class which implements the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface.

hive.exec.failure.hooks

Default Value: (empty)
Added In: Hive 0.8.0

Comma-separated list of on-failure hooks to be invoked for each statement. An on-failure hook is specified as the name of Java class which implements the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface.

hive.merge.mapfiles

Default Value: true
Added In:

Merge small files at the end of a map-only job.

hive.merge.mapredfiles

Default Value: false
Added In:

Merge small files at the end of a map-reduce job.

hive.mergejob.maponly

Default Value: true
Added In:

Try to generate a map-only job for merging files if CombineHiveInputFormat is supported.

hive.merge.size.per.task

Default Value: 256000000
Added In:

Size of merged files at the end of the job.

hive.merge.smallfiles.avgsize

Default Value: 16000000
Added In:

When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is trueThe number means after how many rows processed it needs to check the memory usage.

hive.heartbeat.interval

Default Value: 1000
Added In:

...

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 79

New Version 80

Key

hive.

map

hive.map.aggr.hash.force.flush.memory.threshold

hive

.mapjoin.followby.map.aggr.hash.percentmemory

hive.smalltable.filesizehive.mapjoin.smalltable.filesize

hive.skewjoin.mapjoin.map.tasks

hive.mapjoin.localtask.max.memory.usage

hive.skewjoin.mapjoin.min.split

hive.mapjoin.followby.gby.localtask.max.memory.usage

hive.mapjoin.check.memory.rows

hive.optimize.skewjoin

Default Value: ... falseAdded In: ...Hive 0....6.0Whether to ...enable skew join optimization. (Also see hive.optimize.skewjoin.compiletime.)

hive.skewjoin.key

hive.skewjoin.mapjoin.map.tasks

hive.skewjoin.mapjoin.min.split

hive.optimize.skewjoin.compiletime

hive.mapred.supports.subdirectories

hive.mapred.mode

hive.exec.script.maxerrsize

hive.exec.script.allow.partial.consumption

hive.script.operator.id.env.var

hive.exec.compress.output

hive.optimize.union.remove

hive.exec.compress.intermediate

hive.exec.parallel

hive.mapred.supports.subdirectories

hive.mapred.mode

hive.exec.

script.

maxerrsize

hive.exec

.script.allow.partial.consumption

hive.

script.operator.id.env.var

hive.counters.group.name

hive.exec.compress.output

hive.exec.

compress.

intermediate

hive.exec.

parallel

hive.exec.parallel.

thread.

number

hive.merge.mapfiles

hive.exec.rowoffset

hive.task.progress

hive.mergejob.maponly

hive.merge.size.per.task

hive.merge.smallfiles.avgsize

hive.mapjoin.smalltable.filesize

hive.mapjoin.localtask.max.memory.usage

hive.mapjoin.followby.gby.localtask.max.memory.usage

hive.mapjoin.check.memory.rows

hive.counters.group.name

hive.exec.pre.hooks

hive.exec.post.hooks

hive.exec.failure.hooks

hive.merge.mapfiles

hive.merge.mapredfiles

hive.mergejob.maponly

hive.merge.size.per.task

hive.merge.smallfiles.avgsize

hive.heartbeat.interval

hive.smalltable.filesize
hive.mapjoin.smalltable.filesize

Default Value:
...
`false`
Added In:
...
Hive 0.
...
6.0
Whether to
...
enable skew join optimization. (Also see hive.optimize.skewjoin.compiletime.)