Configuring Hive
A number of configuration variables in Hive can be used by the administrator to change the behavior for their installations and user sessions. These variables can be configured in any of the following ways, shown in the order of preference:
...
- Using the set command in the cli for setting session level values for the configuration variable for all statements subsequent to the set command. e.g.
set hive.exec.scratchdir=/tmp/mydir;
|
...
- sets the scratch directory (which is used by hive to store temporary output and plans) to
/tmp/mydir
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
|
---|
bin/hive -hiveconf hive.exec.scratchdir=/tmp/mydir
|
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
|
---|
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/mydir</value>
<description>Scratch space for Hive jobs</description>
</property>
|
...
hive-default.xml.template
...
contains
...
the
...
default
...
values
...
for
...
various
...
configuration
...
variables
...
that
...
come
...
prepackaged
...
in
...
a
...
Hive
...
distribution.
...
In
...
order
...
to
...
override
...
any
...
of
...
the
...
values,
...
create
...
hive-site.xml
...
instead
...
and
...
set
...
the
...
value
...
in
...
that
...
file
...
as
...
shown
...
above.
...
Please
...
note
...
that
...
this
...
template
...
file
...
is
...
not
...
used
...
by
...
Hive
...
at
...
all
...
(as
...
of
...
Hive
...
0.9.0)
...
and
...
so
...
it
...
might
...
be
...
out
...
of
...
date
...
or
...
out
...
of
...
sync
...
with
...
the
...
actual
...
values.
...
The
...
canonical
...
list
...
of
...
configuration
...
options
...
is
...
now
...
only
...
managed
...
in
...
the
...
HiveConf
...
java
...
class.
...
hive-default.xml.template
...
is
...
located
...
in
...
the
...
conf
...
directory
...
in
...
your
...
installation
...
root.
...
hive-site.xml
...
should
...
also
...
be
...
created
...
in
...
the
...
same
...
directory.
...
The
...
administrative
...
configuration
...
variables
...
are
...
listed
...
below
...
.
Temporary Folders
Hive uses temporary folders both on the machine running the Hive client and the default HDFS instance. These folders are used to store per-query temporary/intermediate data sets and are normally cleaned up by the hive client when the query is finished. However, in cases of abnormal hive client termination, some data may be left behind. The configuration details are as follows:
- On the HDFS cluster this is set to /tmp/hive-<username>
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Note
...
that
...
when
...
writing
...
data
...
to
...
a
...
table/partition,
...
Hive
...
will
...
first
...
write
...
to
...
a
...
temporary
...
location
...
on
...
the
...
target
...
table's
...
filesystem
...
(using
...
hive.exec.scratchdir
...
as
...
the
...
temporary
...
location)
...
and
...
then
...
move
...
the
...
data
...
to
...
the
...
target
...
table.
...
This
...
applies
...
in
...
all
...
cases
...
-
...
whether
...
tables
...
are
...
stored
...
in
...
HDFS
...
(normal
...
case)
...
or
...
in
...
file
...
systems
...
like
...
S3
...
or
...
even
...
NFS.
Log Files
Hive client produces logs and history files on the client machine. Please see Error Logs for configuration details.
Derby Server Mode
Derby is the default database for the Hive metastore (Metadata Store). To run Derby as a network server for multiple users, see Hive Using Derby in Server Mode.
Configuration Variables
Broadly the configuration variables for Hive administration are categorized into:
Table of Content Zone |
---|
Also see Hive Configuration Properties in the Language Manual for non-administrative configuration variables. Hive Configuration Variables
h3. Log Files
Hive client produces logs and history files on the client machine. Please see [Error Logs|GettingStarted#Error Logs] for configuration details.
h3. Derby Server Mode
[Derby|http://db.apache.org/derby/] is the default database for the Hive metastore ([Metadata Store|GettingStarted#Metadata Store]). To run Derby as a network server for multiple users, see [Hive Using Derby in Server Mode|HiveDerbyServerMode].
h3. Configuration Variables
Broadly the configuration variables for Hive administration are categorized into:
{toc-zone|location=top}
Also see [Hive Configuration Properties|Configuration Properties] in the [Language Manual|LanguageManual] for non-administrative configuration variables.
h4. Hive Configuration Variables
|| Variable Name || Description || Default Value |
| Variable Name | Description | Default Value |
---|
| {{ }} [|https://issues.apache.org/jira/browse/HIVE-2822])| text |
|) | text | hive.exec.script.wrapper |
| {{ }} {{ }}||
|| |null|
|| | \\ |
| | hive.exec.local.scratchdir |
| [|https://issues.apache.org/jira/browse/HIVE-1577])||
|| | |
| | hive.exec.script.maxerrsize |
| {{}} {{}} {{}} ||
| | hive.exec.compress.output |
| ||
| | hive.exec.compress.intermediate |
| ||
|| | |
|| | |
|| ||
|| ||
|| |1000|
| | | 1000 | hive.map.aggr.hash.percentmemory |
| ||
|| ||
|| ||
|| ||
|| ||
| | hive.merge.smallfiles.avgsize |
| |16000000|
| | 16000000 | hive.querylog.enable.plan.progress |
| {{}} [|https://issues.apache.org/jira/browse/HIVE-3230])| true |
| ) | true | hive.querylog.location |
| | |
| | hive.querylog.plan.progress.interval |
| {{hive.exec.counters.pull.interval
|
}} {{hive.exec.counters.pull.interval
|
}} {{hive.exec.counters.pull.interval
|
}} {{hive.querylog.enable.plan.progress
|
}} {{}} [|https://issues.apache.org/jira/browse/HIVE-3230])| 60000 |
| ) | 60000 | hive.stats.autogather |
| [0|https://issues.apache.org/jira/browse/HIVE-1361]) | true |
| ) | true | hive.stats.dbclass |
| {{}} {{}} {{}} {{}} [|https://issues.apache.org/jira/browse/HIVE-1361]) | |
| | hive.stats.dbconnectionstring |
| [|https://issues.apache.org/jira/browse/HIVE-1361]) | ) | jdbc:derby:;databaseName=TempStatsStore;create=true |
|
| | [|https://issues.apache.org/jira/browse/HIVE-1361]) | ) | org.apache.derby.jdbc.EmbeddedDriver |
|
| | [|https://issues.apache.org/jira/browse/HIVE-1653]) | false |
|) | false | hive.enforce.bucketing |
| | false |
| | false | hive.variable.substitute |
| {{}} [|https://issues.apache.org/jira/browse/HIVE-1096] for details. (as of Hive 0.7.0) | true |
| for details. (as of Hive 0.7.0) | true | hive.variable.substitute.depth |
| [|https://issues.apache.org/jira/browse/HIVE-2021]) | 40 |
h4. Hive Metastore Configuration Variables
Please see the [Admin Manual's section on the Metastore|AdminManual MetastoreAdmin] for details.
For security configuration (Hive 0.10 and later), see the [Hive Metastore Security section|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveMetastoreSecurity] in the Language Manual's [Configuration Properties|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties].
h4. Hive Configuration Variables Used to Interact with Hadoop
|*Variable Name*|*Description*|*Default Value*|
|hadoop.bin.path|The location of hadoop script which is used to submit jobs to hadoop when submitting through a separate jvm.|Please see the Admin Manual's section on the Metastore for details. For security configuration (Hive 0.10 and later), see the Hive Metastore Security section in the Language Manual's Configuration Properties. Hive Configuration Variables Used to Interact with HadoopVariable Name | Description | Default Value | hadoop.bin.path | The location of hadoop script which is used to submit jobs to hadoop when submitting through a separate jvm. | $HADOOP_HOME/bin/hadoop |
|
|| ||
|| ||
|| |null|
|| ||
|| ||
|| |null|
h4. Hive Variables Used to Pass Run Time Information
|*Variable Name*|*Description*|*Default Value*|
|Variable Name | Description | Default Value | hive.session.id |
| | |
|| | |
|| | |
|| ||
|| | |
|| | |
|| | |
{toc-zone}
h2. Configuring HCatalog and WebHCat
For information about configuring HCatalog and WebHCat, see:
* [HCatalog Installation from Tarball|HCatalog InstallHCat]
* [WebHCat Configuration|WebHCat Configure] |
Configuring HCatalog and WebHCat
For information about configuring HCatalog and WebHCat, see: