Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add link to HIVE-17574 for hive.resource.use.hdfs.location

...

Table of Content Zone
locationtop
typelist

Also see Hive Configuration Properties in the Language Manual for non-administrative configuration variables.

Info
titleVersion information: Metrics

 A new Hive metrics system based on Codahale is introduced in releases 1.3.0 and 2.0.0 by HIVE-10761. To configure it or revert to the old metrics system, see the Metrics section of Hive Configuration Properties.

Hive Configuration Variables

Variable Name

Description

Default Value

hive.ddl.output.format

The data format to use for DDL output (e.g. DESCRIBE table). One of "text" (for human readable text) or "json" (for a json object). (As of Hive 0.9.0.)

text

hive.exec.script.wrapper

Wrapper around any invocations to script operator e.g. if this is set to python, the script passed to the script operator will be invoked as python <script command>. If the value is null or not set, the script is invoked as <script command>.

null

hive.exec.plan

 

null

hive.exec.scratchdir

This directory is used by Hive to store the plans for different map/reduce stages for the query as well as to stored the intermediate outputs of these stages.

Hive 0.14.0 and later: HDFS root scratch directory for Hive jobs, which gets created with write all (733) permissionFor each connecting user, an HDFS scratch directory ${hive.exec.scratchdir}/<username> is created with ${hive.scratch.dir.permission}.

/tmp/<user.name>/hive (Hive 0.8.0 and earlier)
/tmp/hive-<user.name> (as of Hive 0.8.1 to 0.14.0)
/tmp/hive (Hive 0.14.0 and later)

hive.scratch.dir.permissionThe permission for the user-specific scratch directories that get created in the root scratch directory ${hive.exec.scratchdir}. (As of Hive 0.12.0.)700 (Hive 0.12.0 and later)

hive.exec.local.scratchdir

This directory is used for temporary files when Hive runs in local mode. (As of Hive 0.10.0.)

/tmp/<user.name>

hive.exec.submitviachild

Determines whether the map/reduce jobs should be submitted through a separate jvm in the non local mode.

false - By default jobs are submitted through the same jvm as the compiler

hive.exec.script.maxerrsize

Maximum number of serialization errors allowed in a user script invoked through TRANSFORM or MAP or REDUCE constructs.

100000

hive.exec.compress.output

Determines whether the output of the final map/reduce job in a query is compressed or not.

false

hive.exec.compress.intermediate

Determines whether the output of the intermediate map/reduce jobs in a query is compressed or not.

false

hive.resource.use.hdfs.location

Reference HDFS based files/jars directly instead of copying to session based HDFS scratch directory. (As of Hive 2.2.1.)

true

hive.jar.path

The location of hive_cli.jar that is used when submitting jobs in a separate jvm.

 

hive.aux.jars.path

The location of the plugin jars that contain implementations of user defined functions and SerDes.

 

hive.reloadable.aux.jars.path

The location of plugin jars that can be renewed (added, removed, or updated) by executing the Beeline reload command, without having to restart HiveServer2. These jars can be used just like the auxiliary classes in hive.aux.jars.path for creating UDFs or SerDes. (As of Hive 0.14.0.) 

hive.partition.pruning

A strict value for this variable indicates that an error is thrown by the compiler in case no partition predicate is provided on a partitioned table. This is used to protect against a user inadvertently issuing a query against all the partitions of the table.

nonstrict

hive.map.aggr

Determines whether the map side aggregation is on or not.

true

hive.join.emit.interval

 

1000

hive.map.aggr.hash.percentmemory

 

(float)0.5

hive.default.fileformat

Default file format for CREATE TABLE statement. Options are TextFile, SequenceFile, RCFile, and Orc.

TextFile

hive.merge.mapfiles

Merge small files at the end of a map-only job.

true

hive.merge.mapredfiles

Merge small files at the end of a map-reduce job.

false

hive.merge.size.per.task

Size of merged files at the end of the job.

256000000

hive.merge.smallfiles.avgsize

When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true.

16000000

hive.querylog.enable.plan.progress

Whether to log the plan's progress every time a job's progress is checked. These logs are written to the location specified by hive.querylog.location. (As of Hive 0.10.)

true

hive.querylog.location

Directory where structured hive query logs are created. One file per session is created in this directory. If this variable set to empty string structured log will not be created.

/tmp/<user.name>

hive.querylog.plan.progress.interval

The interval to wait between logging the plan's progress in milliseconds. If there is a whole number percentage change in the progress of the mappers or the reducers, the progress is logged regardless of this value. The actual interval will be the ceiling of (this value divided by the value of hive.exec.counters.pull.interval) multiplied by the value of hive.exec.counters.pull.interval i.e. if it is not divide evenly by the value of hive.exec.counters.pull.interval it will be logged less frequently than specified. This only has an effect if hive.querylog.enable.plan.progress is set to true. (As of Hive 0.10.)

60000

hive.stats.autogather

A flag to gather statistics automatically during the INSERT OVERWRITE command. (As of Hive 0.7.0.)

true

hive.stats.dbclass

The default database that stores temporary hive statistics. Valid values are hbase and jdbc while jdbc should have a specification of the Database to use, separated by a colon (e.g. jdbc:mysql). (As of Hive 0.7.0.)

jdbc:derby

hive.stats.dbconnectionstring

The default connection string for the database that stores temporary hive statistics. (As of Hive 0.7.0.)

jdbc:derby:;databaseName=TempStatsStore;create=true

hive.stats.jdbcdriver

The JDBC driver for the database that stores temporary hive statistics. (As of Hive 0.7.0.)

org.apache.derby.jdbc.EmbeddedDriver

hive.stats.reliable

Whether queries will fail because stats cannot be collected completely accurately. If this is set to true, reading/writing from/into a partition may fail becuase the stats could not be computed accurately. (As of Hive 0.10.0.)

false

hive.enforce.bucketing

If enabled, enforces inserts into bucketed tables to also be bucketed. (Hive 0.6.0 through Hive 1.x.x only)

false

hive.variable.substitute

Substitutes variables in Hive statements which were previously set using the set command, system variables or environment variables. See HIVE-1096 for details. (As of Hive 0.7.0.)

true

hive.variable.substitute.depth

The maximum replacements the substitution engine will do. (As of Hive 0.10.0.)

40

hive.vectorized.execution.enabled

This flag controls the vectorized mode of query execution as documented in HIVE-4160. (As of Hive 0.13.0.)

false

Hive Metastore Configuration Variables

Please see Hive Metastore Administration for information about the configuration variables used to set up the metastore in local, remote, or embedded mode. Also see descriptions in the Metastore section of the Language Manual's Hive Configuration Properties.

For security configuration (Hive 0.10 and later), see the Hive Metastore Security section in the Language Manual's Hive Configuration Properties.

Configuration Variables Used to Interact with Hadoop

Variable Name

Anchor
Hive Configuration Variables Used to Interact with Hadoop
Hive Configuration Variables Used to Interact with Hadoop

Description

Default Value

hadoop.bin.path

The location of the Hadoop script which is used to submit jobs to Hadoop when submitting through a separate JVM.

$HADOOP_HOME/bin/hadoop

hadoop.config.dir

The location of the configuration directory of the Hadoop installation.

$HADOOP_HOME/conf

fs.default.name

The default name of the filesystem (for example, localhost for hdfs://<clustername>:8020).

For YARN this configuration variable is called fs.defaultFS.

file:///

map.input.file

The filename the map is reading from.

null

mapred.job.tracker

The URL to the jobtracker. If this is set to local then map/reduce is run in the local mode.

local

mapred.reduce.tasks

The number of reducers for each map/reduce stage in the query plan.

1

mapred.job.name

The name of the map/reduce job.

null

mapreduce.input.fileinputformat.split.maxsizeFor splittable data this changes the portion of the data that each mapper is assigned. By default, each mapper is assigned based on the block sizes of the source files. Entering a value larger than the block size will decrease the number of splits which creates fewer mappers. Entering a value smaller than the block size will increase the number of splits which creates more mappers.empty
fs.trash.interval

The interval, in minutes, after which a trash checkpoint directory is deleted. (This is also the interval between checkpoints.) The checkpoint directory is located in .Trash under the user's home directory and contains files and directories that were removed since the previous checkpoint.

Any setting greater than 0 enables the trash feature of HDFS.

When using the Transparent Data Encryption (TDE) feature, set this to 0 in Hadoop core-site.xml as documented in HIVE-10978.

0

Hive Variables Used to Pass Run Time Information

Variable Name

Description

Default Value

hive.session.id

The id of the Hive Session.

 

hive.query.string

The query string passed to the map/reduce job.

 

hive.query.planid

The id of the plan for the map/reduce stage.

 

hive.jobname.length

The maximum length of the jobname.

50

hive.table.name

The name of the Hive table. This is passed to the user scripts through the script operator.

 

hive.partition.name

The name of the Hive partition. This is passed to the user scripts through the script operator.

 

hive.alias

The alias being processed. This is also passed to the user scripts through the script operator.

 

...

Starting in Hive release 0.11.0, HCatalog is installed and configured with Hive. The HCatalog server is the same as the Hive metastore.

...