Metastore 3.0 Administration
Version Note
This document applies only to the Metastore in Hive 3.0 and later releases. For Hive 0.x, 1.x, and 2.x releases please see the Metastore Administration document.
Introduction
The definition of Hive objects such as databases, table, and functions are stored in the Metastore. Depending on how the system is configured, statistics and authorization data may also be stored there. Hive, and other execution engines, can then use this data at runtime to determine how to efficiently execute user queries.
The Metastore persists the object definitions to a relational database (RDBMS) via DataNucleus, a Java JDO based Object Relational Mapping (ORM) layer. See Supported RDBMSs below for a list of supported RDBMSs that can be used.
The Metastore can be configured to embed the Derby RDBMS or connect to a external RDBMS. It can be embedded entirely in a user process or run as a service for other processes to connect to. Each of these options will be discussed in turn below.
Changes From Hive 2.x to Hive 3.0
Beginning in Hive 3.0, the Metastore can be run without the of Hive being installed. It is provided as a separate release in order to allow non-Hive systems to easily integrate with it. (It is, however, still included in the Hive release for convenience.) Making the Metastore a standalone service involved changing a number of configuration variable names and tool names. All of the old configuration variables and tools still work for previously existing values and functions in order to maximize backwards compatibility. This document will cover both the old and new names. As new functionality is added it will only be added to the new names.
For details on using the Metastore without Hive, see Running the Metastore Without Hive below.
General Configuration
The metastore reads its configuration from the file metastore-site.xml
. It expects to find this file in $METASTORE_HOME/conf
where $METASTORE_HOME
is an environment variable. For backwards compatibility it will also read any hive-site.xml
or hive-metastoresite.xml
files found in HIVE_HOME/conf
. Configuration options can also be defined on the command line (see Starting and Stopping the Service below).
Configuration values specific to running the Metastore with various RDBMSs, embedded or as a service, and without Hive are discussed in the relevant sections. The following configuration values apply to the Metastore regardless of how it is being run. This table covers only commonly customized configuration values. For less commonly changed configuration values see Less Commonly Changed Configuration Parameters.
Parameter | Hive 2.0 Parameter | Default Value | Description |
---|---|---|---|
metastore.warehouse.dir | hive.metastore.warehouse.dir | URI of the default location for tables in the default catalog and database. | |
metastore.authorization.storage.checks | hive.metastore.authorization.storage.checks | false | Should the metastore do authorization checks against the underlying storage? For example for a drop-partition it would disallow the drop if the user does not have permissions to delete the corresponding directory from the storage. |
datanucleus.schema.autoCreateAll | datanucleus.schema.autoCreateAll | false | Auto creates the necessary schema in the RDBMS at startup if one does not exist. Set this to false after creating it once. To enable auto create also set hive.metastore.schema.verification=false. Auto creation is not recommended in production; run schematool command instead. |
metastore.schema.verification | hive.metastore.schema.verification | true | Enforce metastore schema version consistency. When set to true: verify that version information stored in is compatible with one from Hive jars. Also disable automatic schema migration. Users are required to manually migrate the schema after Hive upgrade which ensures proper metastore schema migration. |
metastore.hmshandler.retry.attempts | hive.hmshandler.retry.attempts | 10 | The number of times to retry a call to the meastore when there is a connection error. |
metastore.hmshandler.retry.interval | hive.hmshandler.retry.interval | 2 sec | Time between retry attempts. |
metastore.log4j.file | hive.log4j.file | none | Log4j configuration file. If unset will look for metastore-log4j2.properties in $METASTORE_HOME/conf |
metastore.stats.autogather | hive.stats.autogather | true | Whether to automatically gather basic statistics during insert commands. |
RDBMS
Option 1: Embedding Derby
Option 2: External RDBMS
Supported RDBMSs
TRY_DIRECT_SQL_DDL and Postgres
Installing, Upgrading, and Checking Metastore Tables in the RDBMS
Running the Metastore
Embedding the Metastore in Your Process
Security Considerations
Metastore Server
javax.jdo.option.ConnectionURL | JDBC connection string for the data store which contains metadata |
javax.jdo.option.ConnectionDriverName | JDBC Driver class name for the data store which contains metadata |
hive.metastore.uris |
THRIFT_URI_SELECTION
Starting and Stopping the Service
Remember to discuss command line options like defining a configuration value
High Availability
Securing the Service
CLIENT_KERBEROS_PRINCIPAL, KERBEROS_*, SSL*, USE_SSL, USE_THRIFT_SASL
Running the Metastore Without Hive
Less Commonly Changed Configuration Parameters
BATCHED_RETRIEVE_*, CLIENT_CONNECT_RETRY_DELAY, FILTER_HOOK, SERDES_USING_METASTORE_FOR_SCHEMA, SERVER_*_THREADS,
THREAD_POOL_SIZE
Security: EXECUTE_SET_UGI
Setting up Caching: CACHED*, CATALOGS_TO_CACHE & AGGREGATE_STATS_CACHE*
Transactions: MAX_OPEN_TXNS, TXNS_*