HiveServer2
HiveServer2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results. The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC.
- The Thrift interface definition language (IDL) for HiveServer2 is available at https://github.com/apache/hive/blob/trunk/service/if/TCLIService.thrift.
- Thrift documentation is available at http://thrift.apache.org/docs/.
This document describes how to set up the server. How to use a client with this server is described in the HiveServer2 Clients document.
Version
Introduced in Hive version 0.11. See HIVE-2935.
How to Configure
Configuration Properties in the hive-site.xml
File
hive.server2.thrift.min.worker.threads – Minimum number of worker threads, default 5.
hive.server2.thrift.max.worker.threads – Maximum number of worker threads, default 500.
hive.server2.thrift.port – TCP port number to listen on, default 10000.
hive.server2.thrift.bind.host – TCP interface to bind to.
See HiveServer2 in the Configuration Properties document for additional properties that can be set for HiveServer2.
Optional Environment Settings
HIVE_SERVER2_THRIFT_BIND_HOST – Optional TCP host interface to bind to. Overrides the configuration file setting.
HIVE_SERVER2_THRIFT_PORT – Optional TCP port number to listen on, default 10000. Overrides the configuration file setting.
Running in HTTP mode
HiveServer2 provides support for sending Thrift RPC messages over HTTP transport (Hive 0.13 onward, see HIVE-4752). This is particularly useful to support a proxying intermediary between the client and the server (for example, for load balancing or security reasons). Currently, you can run HiveServer2 in either TCP mode or the HTTP mode, but not in both. For the corresponding JDBC URL, check this link: HiveServer2 Clients -- JDBC Connection URLs. Use the following settings to enable HTTP mode:
Optional Environment Settings
hive.server2.thrift.http.port – HTTP port number to listen on; default is 10001.
hive.server2.thrift.http.path – The service endpoint; default is cliservice.
hive.server2.thrift.http.min.worker.threads – Minimum worker threads in the server pool; default is 5.
hive.server2.thrift.http.max.worker.threads – Maximum worker threads in the server pool; default is 500.
Optional Global Init File
A global init file can be placed in the configured hive.server2.global.init.file.location location (Hive 0.14 onward, see HIVE-5160, HIVE-7497, and HIVE-8138). This can be either the path to the init file itself, or a directory where an init file named ".hiverc" is expected.
The init file lists a set of commands that will run for users of this HiveServer2 instance, such as register a standard set of jars and functions.
Logging Configuration
HiveServer2 operation logs are available for Beeline clients (Hive 0.14 onward). These parameters configure logging:
- hive.server2.logging.operation.enabled
- hive.server2.logging.operation.log.location
- hive.server2.logging.operation.verbose (Hive 0.14 to 1.1)
- hive.server2.logging.operation.level (Hive 1.2 onward)
How to Start
$HIVE_HOME/bin/hiveserver2
OR
$HIVE_HOME/bin/hive --service hiveserver2
Usage Message
The -H
or --help
option displays a usage message, for example:
$HIVE_HOME/bin/hive --service hiveserver2 -H Starting HiveServer2 usage: hiveserver2 -H,--help Print help information --hiveconf <property=value> Use value for given property
Authentication/Security Configuration
HiveServer2 supports Anonymous (no authentication) with and without SASL, Kerberos (GSSAPI), pass through LDAP, Pluggable Custom Authentication and Pluggable Authentication Modules (PAM, supported Hive 0.13 onwards).
Configuration
Authentication mode:
hive.server2.authentication – Authentication mode, default NONE. Options are NONE (uses plain SASL), NOSASL, KERBEROS, LDAP, PAM and CUSTOM.
Set following for KERBEROS mode:
hive.server2.authentication.kerberos.principal – Kerberos principal for server.
hive.server2.authentication.kerberos.keytab – Keytab for server principal.
Set following for LDAP mode:
hive.server2.authentication.ldap.url – LDAP URL (for example, ldap://hostname.com:389).
hive.server2.authentication.ldap.baseDN – LDAP base DN. (Optional for AD.)
hive.server2.authentication.ldap.Domain – LDAP domain. (Hive 0.12.0 and later.)
See User and Group Filter Support with LDAP Atn Provider in HiveServer2 for other LDAP configuration parameters in Hive 1.3.0 and later.
Set following for CUSTOM mode:
hive.server2.custom.authentication.class – Custom authentication class that implements the org.apache.hive.service.auth.PasswdAuthenticationProvider
interface.
For PAM mode, see details in section on PAM below.
Impersonation
By default HiveServer2 performs the query processing as the user who submitted the query. But if the following parameter is set to false, the query will run as the user that the hiveserver2
process runs as.
hive.server2.enable.doAs – Impersonate the connected user, default true.
To prevent memory leaks in unsecure mode, disable file system caches by setting the following parameters to true (see HIVE-4501):
fs.hdfs.impl.disable.cache – Disable HDFS filesystem cache, default false.
fs.file.impl.disable.cache – Disable local filesystem cache, default false.
Integrity/Confidentiality Protection
Integrity protection and confidentiality protection (beyond just the default of authentication) for communication between the Hive JDBC driver and HiveServer2 are enabled (Hive 0.12 onward, see HIVE-4911). You can use the SASL QOP property to configure this.
- This is only when Kerberos is used for the HS2 client (JDBC/ODBC application) authentication with HiveServer2.
- hive.server2.thrift.sasl.qop in
hive-site.xml
has to be set to one of the valid QOP values ('auth', 'auth-int' or 'auth-conf').
SSL Encryption
Support is provided for SSL encryption (Hive 0.13 onward, see HIVE-5351). To enable, set the following configurations in hive-site.xml
:
hive.server2.use.SSL – Set this to true.
hive.server2.keystore.path – Set this to your keystore path.
hive.server2.keystore.password – Set this to your keystore password.
Pluggable Authentication Modules (PAM)
Support is provided for PAM (Hive 0.13 onward, see HIVE-6466). To configure PAM:
- Download the JPAM native library for the relevant architecture.
- Unzip and copy libjpam.so to a directory (<libjmap-directory>) on the system.
- Add the directory to the LD_LIBRARY_PATH environment variable like so:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<libjmap-directory>
For some PAM modules, you'll have to ensure that your
/etc/shadow
and/etc/login.defs
files are readable by the user running the HiveServer2 process.
Finally, set the following configurations in hive-site.xml
:
hive.server2.authentication – Set this to PAM.
hive.server2.authentication.pam.services – Set this to a list of comma-separated PAM services that will be used. Note that a file with the same name as the PAM service must exist in /etc/pam.d.
Python Client Driver
A Python client driver for HiveServer2 is available at https://github.com/BradRuderman/pyhs2 (thanks, Brad). It includes all the required packages such as SASL and Thrift wrappers.
The driver has been certified for use with Python 2.6 and newer.
To use the pyhs2 driver:
pip install pyhs2
and then:
import pyhs2 with pyhs2.connect(host='localhost', port=10000, authMechanism="PLAIN", user='root', password='test', database='default') as conn: with conn.cursor() as cur: #Show databases print cur.getDatabases() #Execute query cur.execute("select * from table") #Return column info from query print cur.getSchema() #Fetch table results for i in cur.fetch(): print i
You can discuss this driver on the user@hive.apache.org mailing list.
Ruby Client Driver
A Ruby client driver is available on github at https://github.com/forward3d/rbhive.