Table of Contents |
---|
Introduction
All the metadata for Hive tables and partitions are stored in Hive Metastore. Metadata is persisted using JPOX ORM solution so any store that is supported by it. Most of the commercial relational databases and many open source datstores are supported. Any datastore that has JDBC driver can probably be used.
You can find an E/R diagram for the metastore here.
There are 3 different ways to setup metastore server using different Hive configurations. The relevant configuration parameters are
...
Config Param | Description |
javax.jdo.option.ConnectionURL |
...
JDBC |
...
connection |
...
string |
...
for |
...
the |
...
data |
...
store |
...
which |
...
contains |
...
metadata |
...
javax.jdo.option.ConnectionDriverName |
...
JDBC |
...
Driver |
...
class |
...
name |
...
for |
...
the |
...
data |
...
store |
...
which |
...
contains |
...
metadata |
...
hive.metastore.uris |
...
Hive |
...
connects |
...
to |
...
this |
...
URI |
...
to |
...
make |
...
metadata |
...
requests |
...
for |
...
a |
...
remote |
...
metastore |
...
hive.metastore.local |
...
local |
...
or |
...
remote |
...
metastore |
...
hive.metastore.warehouse.dir |
...
URI |
...
of |
...
the |
...
default |
...
location |
...
for |
...
native |
...
tables |
...
Default
...
configuration
...
sets
...
up
...
an
...
embedded
...
metastore
...
which
...
is
...
used
...
in
...
unit
...
tests
...
and
...
is
...
described
...
in
...
the
...
next
...
section.
...
More
...
practical
...
options
...
are
...
described
...
in
...
the
...
subsequent
...
sections.
...
Embedded
...
Metastore
...
Mainly
...
used
...
for
...
unit
...
tests
...
and
...
only
...
one
...
process
...
can
...
connect
...
to
...
metastore
...
at
...
a
...
time.
...
So
...
it
...
is
...
not
...
really
...
a
...
practical
...
solution
...
but
...
works
...
well
...
for
...
unit
...
tests.
...
Config |
...
Param |
...
Config |
...
Value |
...
Comment | |
javax.jdo.option.ConnectionURL |
...
|
...
derby |
...
database |
...
located |
...
at |
...
hive/trunk/build... |
...
javax.jdo.option.ConnectionDriverName |
...
org.apache.derby.jdbc.EmbeddedDriver |
...
Derby |
...
embeded |
...
JDBC |
...
driver |
...
class |
...
hive.metastore.uris |
...
not |
...
needed |
...
since |
...
this |
...
is |
...
a |
...
local |
...
metastore |
|
hive.metastore.local |
...
true | embeded is local | |
hive.metastore.warehouse.dir |
...
|
...
unit test data goes in here on your local filesystem |
If you want to run the metastore as a network server so it can be accessed from multiple nodes try HiveDerbyServerMode.
Local Metastore
In local metastore setup, each Hive Client will open a connection to the datastore and make SQL queries against it. The following config will setup a metastore in a MySQL server. Make sure that the server accessible from the machines where Hive queries are executed since this is a local store. Also the jdbc client library is in the classpath of Hive Client.
Config Param | Config Value | Comment |
javax.jdo.option.ConnectionURL |
...
|
...
|
...
|
...
metadata |
...
is |
...
stored |
...
in |
...
a |
...
MySQL |
...
server |
...
javax.jdo.option.ConnectionDriverName |
...
com.mysql.jdbc.Driver |
...
MySQL |
...
JDBC |
...
driver |
...
class |
...
javax.jdo.option.ConnectionUserName |
...
<user |
...
name> |
...
user |
...
name |
...
for |
...
connecting |
...
to |
...
mysql |
...
server |
...
javax.jdo.option.ConnectionPassword |
...
<password> | password for connecting to mysql server | |
hive.metastore.uris |
...
not |
...
needed |
...
because |
...
this |
...
is |
...
local |
...
store |
|
hive.metastore.local |
...
true | this is local store | |
hive.metastore.warehouse.dir |
...
<base |
...
hdfs |
...
path> |
...
default |
...
location |
...
for |
...
Hive |
...
tables. |
...
Remote Metastore
In remote metastore setup, all Hive Clients will make a connection a metastore server which in turn queries the datastore (MySQL in this example) for metadata. Metastore server and client communicate using Thrift Protocol. Starting with Hive 0.5.0,
...
you
...
can
...
start
...
a
...
thrift
...
server
...
by
...
executing
...
the
...
following
...
command:
Code Block |
---|
} hive --service metastore {code} |
In
...
versions
...
of
...
Hive
...
earlier
...
than
...
0.5.0,
...
it's
...
instead
...
necessary
...
to
...
run
...
the
...
thrift
...
server
...
via
...
direct
...
execution
...
of
...
Java:
Code Block |
---|
} $JAVA_HOME/bin/java -Xmx1024m -Dlog4j.configuration=file://$HIVE_HOME/conf/hms-log4j.properties -Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64/ -cp $CLASSPATH org.apache.hadoop.hive.metastore.HiveMetaStore {code} |
If
...
you
...
execute
...
Java
...
directly,
...
then
...
JAVA_HOME,
...
HIVE_HOME,
...
HADOOP_HOME
...
must
...
be
...
correctly
...
set;
...
CLASSPATH
...
should
...
contain
...
Hadoop,
...
Hive
...
(lib
...
and
...
auxlib),
...
and
...
Java
...
jars.
...
Server
...
Configuration
...
Parameters
...
Config |
...
Param |
...
Config |
...
Value |
...
Comment | |
javax.jdo.option.ConnectionURL |
...
jdbc:mysql://<host |
...
name>/<database |
...
name>?createDatabaseIfNotExist=true |
...
metadata |
...
is |
...
stored |
...
in |
...
a |
...
MySQL |
...
server |
...
javax.jdo.option.ConnectionDriverName |
...
com.mysql.jdbc.Driver |
...
MySQL |
...
JDBC |
...
driver |
...
class |
...
javax.jdo.option.ConnectionUserName |
...
<user |
...
name> |
...
user |
...
name |
...
for |
...
connecting |
...
to |
...
mysql |
...
server |
...
javax.jdo.option.ConnectionPassword |
...
<password> | password for connecting to mysql server | |
hive.metastore.warehouse.dir |
...
<base |
...
hdfs |
...
path> |
...
default |
...
location |
...
for |
...
Hive |
...
tables. |
...
Client
...
Configuration
...
Parameters
...
Config |
...
Param |
...
Config |
...
Value |
...
Comment | |
hive.metastore.uris |
...
thrift://<host_name>:<port> |
...
host |
...
and |
...
port |
...
for |
...
the |
...
thrift |
...
metastore |
...
server |
...
hive.metastore.local |
...
false | this is local store | |
hive.metastore.warehouse.dir |
...
<base |
...
hdfs |
...
path> |
...
default |
...
location |
...
for |
...
Hive |
...
tables. |
...
If
...
you
...
are
...
using
...
MySQL
...
as
...
the
...
datastore
...
for
...
metadata,
...
put
...
MySQL
...
client
...
libraries
...
in
...
HIVE_HOME/lib
...
before
...
starting
...
Hive
...
Client
...
or
...
HiveMetastore
...
Server.
...
Metastore
...
Deployment
...
Options
...
in
...
Pictures
...
...