WebHCat Configuration
Configuration Files
The configuration for WebHCat (Templeton) merges the normal Hadoop configuration with the WebHCat-specific variables. Because WebHCat is designed to connect services that are not normally connected, the configuration is more complex than might be desirable.
The WebHCat-specific configuration is split into two layers:
- webhcat-default.xml – All the configuration variables that WebHCat needs. This file sets the defaults that ship with WebHCat and should only be changed by WebHCat developers. Do not copy this file or change it to maintain local installation settings. Because webhcat-default.xml is present in the WebHCat war file, editing a local copy of it will not change the configuration.
- webhcat-site.xml – The (possibly empty) configuration file in which the system administrator can set variables for their Hadoop cluster. Create this file and maintain entries in it for configuration variables that require you to override default values based on your local installation.
Note
The WebHCat server will require restart after any change to the configuration.
The configuration files are loaded in this order with later files overriding earlier ones:
- To find the configuration files, WebHCat first attempts to load a file from the
CLASSPATH
and then looks in the directory specified in theTEMPLETON_HOME
environment variable.
Configuration files may access the special environment variable env
for all environment variables. For example, the Pig executable could be specified using:
${env.PIG_HOME}/bin/pig
Configuration variables that use a filesystem path try to have reasonable defaults. However, it's always safe to specify the full and complete path if there is any uncertainty.
Log File Location
The webhcat-log4j.properties file sets the location of the log files created by WebHCat and some other properties of the logging system.
Configuration Variables
Some default values for configuration variables depend on the release number. Defaults shown here are for the version of WebHCat that is included in Hive release 0.11.0. Defaults for the previous release are shown in the HCatalog 0.5.0 documentation.
Name |
Default (Hive 0.11.0) |
Description |
---|---|---|
templeton.port |
|
The HTTP port for the main server. |
templeton.hadoop.config.dir |
|
The path to the Hadoop configuration. |
templeton.jar |
|
The path to the WebHCat jar file. |
templeton.libjars |
|
Jars to add to the classpath. |
templeton.override.jars |
|
Jars to add to the |
templeton.override.enabled |
|
Enable the override path in templeton.override.jars. |
templeton.streaming.jar |
|
The HDFS path to the Hadoop streaming jar file. |
templeton.hadoop |
|
The path to the Hadoop executable. |
templeton.pig.archive |
|
The path to the Pig archive. |
templeton.pig.path |
|
The path to the Pig executable. |
templeton.hcat |
|
The path to the HCatalog executable. |
templeton.hive.archive |
|
The path to the Hive archive. |
templeton.hive.path |
|
The path to the Hive executable. |
templeton.hive.properties |
|
Properties to set when running Hive. To use it in a cluster with Kerberos security enabled, set |
templeton.exec.encoding |
|
The encoding of the stdout and stderr data. |
templeton.exec.timeout |
|
How long in milliseconds a program is allowed to run on the WebHCat box. |
templeton.exec.max-procs |
|
The maximum number of processes allowed to run at once. |
templeton.exec.max-output-bytes |
|
The maximum number of bytes from stdout or stderr stored in ram. |
templeton.controller.mr.child.opts |
|
Java options to be passed to WebHCat controller map task. |
templeton.exec.envs |
|
The environment variables passed through to exec. |
templeton.zookeeper.hosts |
|
ZooKeeper servers, as comma-separated host:port pairs. |
templeton.zookeeper.session-timeout |
|
ZooKeeper session timeout in milliseconds. |
templeton.callback.retry.interval |
|
How long to wait between callback retry attempts in milliseconds. |
templeton.callback.retry.attempts |
|
How many times to retry the callback. |
templeton.storage.class |
|
The class to use as storage. |
templeton.storage.root |
|
The path to the directory to use for storage. |
templeton.hdfs.cleanup.interval |
|
The maximum delay between a thread's cleanup checks. |
templeton.hdfs.cleanup.maxage |
|
The maximum age of a WebHCat job. |
templeton.zookeeper.cleanup.interval |
|
The maximum delay between a thread's cleanup checks. |
templeton.zookeeper.cleanup.maxage |
|
The maximum age of a WebHCat job. |
templeton.kerberos.secret |
A random value |
The secret used to sign the HTTP cookie value. The default value is a random value. Unless multiple WebHCat instances need to share the secret the random value is adequate. |
templeton.kerberos.principal |
None |
The Kerberos principal to used by the server. As stated by the Kerberos SPNEGO specification, it should be |
templeton.kerberos.keytab |
None |
The keytab file containing the credentials for the Kerberos principal. |
Previous: Installation
Next: Reference
Hive configuration: Configuring Hive, Hive Configuration Properties, Thrift Server Setup
General: WebHCat Manual – HCatalog Manual – Hive Wiki Home – Hive Project Site
Old version of this document (HCatalog 0.5.0): Configuration