Table of Contents | ||
---|---|---|
|
Configuring Hive
A number of configuration variables in Hive can be used by the administrator to change the behavior for their installations and user sessions. These variables can be configured in any of the following ways, shown in the order of preference:
Using the set command in the
cliCLI or Beeline for setting session level values for the configuration variable for all statements subsequent to the set command.
e.g.Code Block set hive.exec.scratchdir=/tmp/mydir;
For example, the following command sets the scratch directory (which is used by
hiveHive to store temporary output and plans)
toto
for all subseq/tmp/mydir
for all subsequent statements:
No Format set hive.exec.scratchdir=/tmp/mydir;
Using the
Using--hiveconf
option of thehive
command (in the CLI) orbeeline
command-hiveconf
option on the clifor the entire session.
e.g.For example:
No Format Code Block bin/hive --hiveconf hive.exec.scratchdir=/tmp/mydir
In
. e.g.hive-site.xml
. This is used for setting values for the entire Hive configuration(see hive-site.xml and hive-default.xml.template below). For example:
No Format Code Block xml xml <property> <name>hive.exec.scratchdir</name> <value>/tmp/mydir</value> <description>Scratch space for Hive jobs</description> </property>
- In server-specific configuration files (supported starting Hive 0.14). You can set metastore-specific configuration values in hivemetastore-site.xml, and HiveServer2-specific configuration values in hiveserver2-site.xml.
The server-specific configuration file is useful in two situations:
- You want a different configuration for one type of server (for example – enabling authorization only in HiveServer2 and not CLI).
- You want to set a configuration value only in a server-specific configuration file (for example – setting the metastore database password only in the metastore server configuration file).
HiveMetastore server reads hive-site.xml as well as hivemetastore-site.xml configuration files that are available in the $HIVE_CONF_DIR or in the classpath. If the metastore is being used in embedded mode (i.e., hive.metastore.uris is not set or empty) in
hive
commandline or HiveServer2, the hivemetastore-site.xml gets loaded by the parent process as well.
The value of hive.metastore.uris is examined to determine this, and the value should be set appropriately in hive-site.xml .
Certain metastore configuration parameters like hive.metastore.sasl.enabled, hive.metastore.kerberos.principal, hive.metastore.execute.setugi, and hive.metastore.thrift.framed.transport.enabled are used by the metastore client as well as server. For such common parameters it is better to set the values in hive-site.xml, that will help in keeping them consistent.HiveServer2 reads hive-site.xml as well as hiveserver2-site.xml that are available in the $HIVE_CONF_DIR or in the classpath.
If HiveServer2 is using the metastore in embedded mode, hivemetastore-site.xml also is loaded.The order of precedence of the config files is as follows (later one has higher precedence) –
hive-site.xml -> hivemetastore-site.xml -> hiveserver2-site.xml -> '-hiveconf
' commandline parameters.
hive-site.xml and hive-default.xml.template
hive-default.xml.template
contains the default values for various configuration variables that come with prepackaged in a Hive distribution. In order to override any of the values, create hive-site.xml
instead and set the value in that file as shown above.
hive-default.xml.template
is located in the conf
directory in your installation root, and hive-site.xml
should also be created in the same directory.
Please note that this file is the template file hive-default.xml.template
is not used by Hive at all (as of Hive 0.9.0) and so it might – the canonical list of configuration options is only managed in the HiveConf
java class. The template file has the formatting needed for hive-site.xml
, so you can paste configuration variables from the template file into hive-site.xml
and then change their values to the desired configuration.
In Hive releases 0.9.0 through 0.13.1, the template file does not necessarily contain all configuration options found in HiveConf.java
and some of its values and descriptions might be out of date or out of sync with the actual values and descriptions. However, as of Hive 0.14.0 the template file is generated directly from HiveConf.java
and therefore it is a reliable source for configuration variables and their defaults. The canonical list of configuration options is now only managed in the HiveConf
java class.
hive-default.xml.template
is located in the conf
directory in your installation root. hive-site.xml
should also be created in the same directory.
The administrative configuration variables are listed below. User variables are listed in Hive Configuration Properties. As of Hive 0.14.0 you can display information about a configuration variable with the SHOW CONF command.
Temporary Folders
Hive uses temporary folders both on the machine running the Hive client and the default HDFS instance. These folders are used to store per-query temporary/intermediate data sets and are normally cleaned up by the hive client when the query is finished. However, in cases of abnormal hive client termination, some data may be left behind. The configuration details are as follows:
- On the HDFS cluster this is set to /tmp/hive-<username> by default and is controlled by the configuration variable hive.exec.scratchdir
- On the client machine, this is hardcoded to /tmp/<username>
Note that when writing data to a table/partition, Hive will first write to a temporary location on the target table's filesystem (using hive.exec.scratchdir as the temporary location) and then move the data to the target table. This applies in all cases - whether tables are stored in HDFS (normal case) or in file systems like S3 or even NFS.
Log Files
Hive client produces logs and history files on the client machine. Please see Hive Logging for configuration details.
For WebHCat logs, see Log Files in the WebHCat manual.
Derby Server Mode
Derby is the default database for the Hive metastore (Metadata Store). To run Derby as a network server for multiple users, see Hive Using Derby in Server Mode.
Configuration Variables
Broadly the configuration variables for Hive administration are categorized into:
Table of Content Zone | |||||
---|---|---|---|---|---|
| |||||
Also see Hive Configuration Properties in the Language Manual for non-administrative configuration variables.
|
...
Hive Configuration Variables
|
...
|
...
|
...
|
...
|
...
|
...
|
...
Hive Metastore Configuration VariablesPlease see Hive Metastore Administration for information about the configuration variables used to set up the metastore in local, remote, or embedded mode. Also see |
...
descriptions in the Metastore section of the Language Manual's |
...
Hive Configuration Properties. For security configuration (Hive 0.10 and later), see the Hive Metastore Security section in the Language Manual's Hive Configuration Properties. Configuration Variables Used to Interact |
...
with Hadoop
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
Hive Variables Used to Pass Run Time Information
|
...
|
...
|
Temporary Folders
Hive uses temporary folders both on the machine running the Hive client and the default HDFS instance. These folders are used to store per-query temporary/intermediate data sets and are normally cleaned up by the hive client when the query is finished. However, in cases of abnormal hive client termination, some data may be left behind. The configuration details are as follows:
- On the HDFS cluster this is set to /tmp/hive-<username> by default and is controlled by the configuration variable hive.exec.scratchdir
- On the client machine, this is hardcoded to /tmp/<username>
Note that when writing data to a table/partition, Hive will first write to a temporary location on the target table's filesystem (using hive.exec.scratchdir as the temporary location) and then move the data to the target table. This applies in all cases - whether tables are stored in HDFS (normal case) or in file systems like S3 or even NFS.
Log Files
Removing Hive Metastore Password from Hive Configuration
Support for this was added in Hive 0.14.0 with HIVE-7634 and HADOOP-10904. By setting up a CredentialProvider to handle storing/retrieval of passwords, you can remove the need to keep the Hive metastore password in cleartext in the Hive configuration.
Set up the CredentialProvider to store the Hive Metastore password, using the key javax.jdo.option.ConnectionPassword (the same key as used in the Hive configuration). For example, the following command adds the metastore password to a JCEKS keystore file at /usr/lib/hive/conf/hive.jceks:
No Format $ hadoop credential create javax.jdo.option.ConnectionPassword -provider jceks://file/usr/lib/hive/conf/hive.jceks Enter password: Enter password again: javax.jdo.option.ConnectionPassword has been successfully created. org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.
Make sure to restrict access to this file to just the user running the Hive Metastore server/HiveServer2.
See http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html#credential for more information.Update the Hive configuration to use the designated CredentialProvider. For example to use our /usr/lib/hive/conf/hive.jceks file:
No Format <!-- Configure credential store for passwords--> <property> <name>hadoop.security.credential.provider.path</name> <value>jceks://file/usr/lib/hive/conf/hive.jceks</value> </property>
This configures the CredentialProvider used by http://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html#getPassword(java.lang.String), which is used by Hive to retrieve the metastore password.
- Remove the Hive Metastore password entry (javax.jdo.option.ConnectionPassword) from the Hive configuration. The CredentialProvider will be used instead.
- Restart Hive Metastore Server/HiveServer2.
Configuring HCatalog and WebHCat
HCatalog
Starting in Hive release 0.11.0, HCatalog is installed and configured with Hive. The HCatalog server is the same as the Hive metastore.
- See Hive Metastore Administration for metastore configuration properties.
- See HCatalog Installation from Tarball for additional information.
For Hive releases prior to 0.11.0, see the "Thrift Server Setup" section in the HCatalog 0.5.0 document Installation from Tarball.
WebHCat
For information about configuring WebHCat, see WebHCat Configuration.
Hive client produces logs and history files on the client machine. Please see Error Logs on configuration details.