Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Configuration values specific to running the Metastore with various RDBMSs, embedded or as a service, and without Hive are discussed in the relevant sections.  The following configuration values apply to the Metastore regardless of how it is being run.  This table covers only commonly customized configuration values.  For less commonly changed configuration values see Less Commonly Changed Configuration Parameters.

 

ParameterHive 2 .0 ParameterDefault ValueDescription
metastore.warehouse.dirhive.metastore.warehouse.dir URI of the default location for tables in the default catalog and database.
datanucleus.schema.autoCreateAlldatanucleus.schema.autoCreateAllfalse

Auto creates the necessary schema in the RDBMS at startup if one does not exist. Set this to false after creating it once. To enable auto create also set hive.metastore.schema.verification=false. Auto creation is not recommended in production; run schematool instead.

metastore.schema.verificationhive.metastore.schema.verificationtrue

Enforce metastore schema version consistency. When set to true: verify that version information stored in is compatible with the version of the Metastore jar. Also disable automatic schema migration. Users are required to manually migrate the schema after upgrade, which ensures proper schema migration.
When set to false: warn if the version information stored in Metastore doesn't match the version of the Metastore jar.

metastore.hmshandler.retry.attemptshive.hmshandler.retry.attempts10The number of times to retry a call to the meastore when there is a connection error.
metastore.hmshandler.retry.intervalhive.hmshandler.retry.interval2 secTime between retry attempts.
metastore.log4j.filehive.log4j.filenoneLog4j configuration file. If unset will look for metastore-log4j2.properties in $METASTORE_HOME/conf
metastore.stats.autogatherhive.stats.autogathertrueWhether to automatically gather basic statistics during insert commands.

...

Except in the case of HiveServer2, using this mode does raise raises a few concerns.  First, having many clients will put a burden on the backing RDBMS since each client will have its own set of connections.  

Security Considerations

 

Metastore Server

javax.jdo.option.ConnectionURL

JDBC connection string for the data store which contains metadata

javax.jdo.option.ConnectionDriverName

JDBC Driver class name for the data store which contains metadata

hive.metastore.uris

THRIFT_URI_SELECTION

Starting and Stopping the Service

Remember to discuss command line options like defining a configuration value

...

.  Second, every client must have read/write access to the RDBMS.  This makes it hard to properly secure the RDBMS.  Therefore embedded mode is not recommended in production use cases with the exception of HiveServer2.

Metastore Server

To run the Metastore as a service, you must first configure it with a URL.

Configured OnParameterHive 2 ParameterFormatDefault ValueComment
Clientmetastore.thrift.urishive.metastore.uristhrift://<HOST>:<PORT>[, thrift://<HOST>:<PORT>...]noneHOST = hostname, PORT = port, default is 9083.
Servermetastore.thrift.porthive.metastore.portinteger9083Port Thrift will listen on.

Once you have configured your clients, you can start the Metastore on a server using the start-metastore utility.  See the -help option of that utility for available options.  There is no stop-metastore script.  Instead you must locate the process id for the metastore and kill that process.

High Availability

The Metastore service is stateless.  This allows you to start multiple instances the service to provide for high availability.  It also allows you to configure some clients to embed the metastore (e.g. HiveServer2) while still running a Metastore service for other clients.  If you are running multiple Metastore services you can put all their URIs into your client's metastore.thrift.uris value and then set metastore.thrift.uri.selection ( in Hive 2 hive.metastore.uri.selection) to RANDOM or SEQUENTIALRANDOM will cause your client to randomly select one of the servers in the list, while SEQUENTIAL will cause it to start at the beginning of the list and attempt to connect to each server in order.

Securing the Service

CLIENT_KERBEROS_PRINCIPAL, KERBEROS_*, SSL*, USE_SSL, USE_THRIFT_SASL

...