Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: separating details of configuring metastore database vs server

...

All the metadata for Hive tables and partitions are stored in accessed through the Hive Metastore. Metadata is persisted using JPOX ORM solution so any store database that is supported by it can be used by Hive. Most of the commercial relational databases and many open source datstores databases are supported. Any datastore that has a JDBC driver can probably be usedSee the list of supported databases in section below.

You can find an E/R diagram for the metastore here.

There are 3 2 different ways to setup the metastore server and metastore database using different Hive configurations:

Configuration options for metastore database where metadata is persisted

Configuration options for metastore server

...

The default configuration sets up an embedded metastore which is used in unit tests and is described in the next section. More practical options are described in the subsequent sections.

Local/Embedded Metastore Database (derby)

An embedded metastore database is mainly used for unit tests. Only one process can connect to the metastore database at a time, so it is not really a practical solution but works well for unit tests.

For unit tests AdminManual MetastoreAdmin configuration for the metastore server is used in conjunction with embedded database.

Derby is the default database for the embedded metastore.

Config Param

Config Value

Comment

javax.jdo.option.ConnectionURL

jdbc:derby:;databaseName=
../build/test/junit_metastore_db;create=true

Derby database located at hive/trunk/build...

javax.jdo.option.ConnectionDriverName

org.apache.derby.jdbc.EmbeddedDriver

Derby embeded JDBC driver class.

hive.metastore.

uris

Not needed since this is a local metastore.

 

hive.metastore.local

true

Embeded is local.

hive.metastore.

warehouse.dir

file://${user.dir}/../build/ql/test/data/warehouse

Unit test data goes in here on your local filesystem.

 

If you want to run Derby as a network server so the metastore can be accessed from multiple nodes, see Hive Using Derby in Server Mode.

...

 

Remote Metastore Database

In

...

this configuration, you would use a traditional standalone RDBMS server. The following example configuration will set up a metastore in a MySQL server.

...

This configuration of metastore database is recommended for any real use.

Config Param

Config Value

Comment

javax.jdo.option.ConnectionURL

jdbc:mysql://<host name>/<database name>?createDatabaseIfNotExist=true

metadata is stored in a MySQL server

javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver

MySQL JDBC driver class

javax.jdo.option.ConnectionUserName

<user name>

user name for connecting to MySQL server

javax.jdo.option.ConnectionPassword

<password>

password for connecting to MySQL server

Local/Embedded Metastore Server

In local/embedded metastore setup, the metastore server component is used like a library within the Hive Client. Each Hive Client will open a connection to the database and make SQL queries against it. Make sure that the database is accessible from the machines where Hive queries are executed since this is a local store. Also make sure the JDBC client library is in the classpath of Hive Client. This configuration is often used with HiveServer2 (to use embedded metastore only with hiveserver2 add "-hiveconf hive.metastore.uris=' '" in command line parameters of hiveserver2 start command or use hiveserver2-site.xml (available in hive 0.14)).

Config Param

Config Value

Comment

hive.metastore.uris

not needed because this is local store

 

hive.metastore.local

true

this is local store (Removed in hive 0.10, see configuration description section).

hive.metastore.warehouse.dir

<base hdfs path>

default location for Hive tables.

Remote Metastore Server

In remote metastore setup, all Hive Clients will make a connection to a metastore server which in turn queries the datastore (MySQL in this example) for metadata. Metastore server and client communicate using Thrift Protocol. Starting with Hive 0.5.0, you can start a Thrift server by executing the following command:

...

Server Configuration Parameters

The following example uses a AdminManual MetastoreAdmin

Config Param

Config Value

Comment

javax.jdo.option.ConnectionURL

jdbc:mysql://<host name>/<database name>?createDatabaseIfNotExist=true

metadata is stored in a MySQL server

javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver

MySQL JDBC driver class

javax.jdo.option.ConnectionUserName

<user name>

user name for connecting to MySQL server

javax.jdo.option.ConnectionPassword

<password>

password for connecting to MySQL server

hive.metastore.warehouse.dir

<base hdfs path>

default location for Hive tables.

...

If you are using MySQL as the datastore for metadata, put MySQL client jdbc libraries in HIVE_HOME/lib before starting Hive Client or HiveMetastore Server.

...