Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

Introduction

All the metadata for Hive tables and partitions are stored in Hive Metastore. Metadata is persisted using JPOX ORM solution so any store that is supported by it. Most of the commercial relational databases and many open source datstores are supported. Any datastore that has JDBC driver can probably be used.

You can find an E/R diagram for the metastore here.

There are 3 different ways to setup metastore server using different Hive configurations. The relevant configuration parameters are

...

Config Param

Description

javax.jdo.option.ConnectionURL

...

JDBC

...

connection

...

string

...

for

...

the

...

data

...

store

...

which

...

contains

...

metadata

...

javax.jdo.option.ConnectionDriverName

...

JDBC

...

Driver

...

class

...

name

...

for

...

the

...

data

...

store

...

which

...

contains

...

metadata

...

hive.metastore.uris

...

Hive

...

connects

...

to

...

this

...

URI

...

to

...

make

...

metadata

...

requests

...

for

...

a

...

remote

...

metastore

...

hive.metastore.local

...

local

...

or

...

remote

...

metastore

...

hive.metastore.warehouse.dir

...

URI

...

of

...

the

...

default

...

location

...

for

...

native

...

tables

...

Default

...

configuration

...

sets

...

up

...

an

...

embedded

...

metastore

...

which

...

is

...

used

...

in

...

unit

...

tests

...

and

...

is

...

described

...

in

...

the

...

next

...

section.

...

More

...

practical

...

options

...

are

...

described

...

in

...

the

...

subsequent

...

sections.

...

Embedded

...

Metastore

...

Mainly

...

used

...

for

...

unit

...

tests

...

and

...

only

...

one

...

process

...

can

...

connect

...

to

...

metastore

...

at

...

a

...

time.

...

So

...

it

...

is

...

not

...

really

...

a

...

practical

...

solution

...

but

...

works

...

well

...

for

...

unit

...

tests.

...

Config

...

Param

...

Config

...

Value

...

Comment

javax.jdo.option.ConnectionURL

...

jdbc:derby:;databaseName=../build/test/junit_metastore_db;create=true

...

derby

...

database

...

located

...

at

...

hive/trunk/build...

...

javax.jdo.option.ConnectionDriverName

...

org.apache.derby.jdbc.EmbeddedDriver

...

Derby

...

embeded

...

JDBC

...

driver

...

class

...

hive.metastore.uris

...

not

...

needed

...

since

...

this

...

is

...

a

...

local

...

metastore

 

hive.metastore.local

...

true

embeded is local

hive.metastore.warehouse.dir

...

No Format
file://\${user.dir}/../build/ql/test/data/warehouse

...

unit test data goes in here on your local filesystem

If you want to run the metastore as a network server so it can be accessed from multiple nodes try HiveDerbyServerMode.

Local Metastore

In local metastore setup, each Hive Client will open a connection to the datastore and make SQL queries against it. The following config will setup a metastore in a MySQL server. Make sure that the server accessible from the machines where Hive queries are executed since this is a local store. Also the jdbc client library is in the classpath of Hive Client.

Config Param

Config Value

Comment

javax.jdo.option.ConnectionURL

...

jdbc:mysql://<host

...

name>/<database

...

name>?createDatabaseIfNotExist=true

...

metadata

...

is

...

stored

...

in

...

a

...

MySQL

...

server

...

javax.jdo.option.ConnectionDriverName

...

com.mysql.jdbc.Driver

...

MySQL

...

JDBC

...

driver

...

class

...

javax.jdo.option.ConnectionUserName

...

<user

...

name>

...

user

...

name

...

for

...

connecting

...

to

...

mysql

...

server

...

javax.jdo.option.ConnectionPassword

...

<password>

password for connecting to mysql server

hive.metastore.uris

...

not

...

needed

...

because

...

this

...

is

...

local

...

store

 

hive.metastore.local

...

true

this is local store

hive.metastore.warehouse.dir

...

<base

...

hdfs

...

path>

...

default

...

location

...

for

...

Hive

...

tables.

...

Remote Metastore

In remote metastore setup, all Hive Clients will make a connection a metastore server which in turn queries the datastore (MySQL in this example) for metadata. Metastore server and client communicate using Thrift Protocol. Starting with Hive 0.5.0,

...

you

...

can

...

start

...

a

...

thrift

...

server

...

by

...

executing

...

the

...

following

...

command:

{
Code Block
}
hive --service metastore
{code}

In

...

versions

...

of

...

Hive

...

earlier

...

than

...

0.5.0,

...

it's

...

instead

...

necessary

...

to

...

run

...

the

...

thrift

...

server

...

via

...

direct

...

execution

...

of

...

Java:

{
Code Block
}
$JAVA_HOME/bin/java  -Xmx1024m -Dlog4j.configuration=file://$HIVE_HOME/conf/hms-log4j.properties -Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64/ -cp $CLASSPATH org.apache.hadoop.hive.metastore.HiveMetaStore
{code}

If

...

you

...

execute

...

Java

...

directly,

...

then

...

JAVA_HOME,

...

HIVE_HOME,

...

HADOOP_HOME

...

must

...

be

...

correctly

...

set;

...

CLASSPATH

...

should

...

contain

...

Hadoop,

...

Hive

...

(lib

...

and

...

auxlib),

...

and

...

Java

...

jars.

...

Server

...

Configuration

...

Parameters

...

Config

...

Param

...

Config

...

Value

...

Comment

javax.jdo.option.ConnectionURL

...

jdbc:mysql://<host

...

name>/<database

...

name>?createDatabaseIfNotExist=true

...

metadata

...

is

...

stored

...

in

...

a

...

MySQL

...

server

...

javax.jdo.option.ConnectionDriverName

...

com.mysql.jdbc.Driver

...

MySQL

...

JDBC

...

driver

...

class

...

javax.jdo.option.ConnectionUserName

...

<user

...

name>

...

user

...

name

...

for

...

connecting

...

to

...

mysql

...

server

...

javax.jdo.option.ConnectionPassword

...

<password>

password for connecting to mysql server

hive.metastore.warehouse.dir

...

<base

...

hdfs

...

path>

...

default

...

location

...

for

...

Hive

...

tables.

...

Client

...

Configuration

...

Parameters

...

Config

...

Param

...

Config

...

Value

...

Comment

hive.metastore.uris

...

thrift://<host_name>:<port>

...

host

...

and

...

port

...

for

...

the

...

thrift

...

metastore

...

server

...

hive.metastore.local

...

false

this is local store

hive.metastore.warehouse.dir

...

<base

...

hdfs

...

path>

...

default

...

location

...

for

...

Hive

...

tables.

...

If

...

you

...

are

...

using

...

MySQL

...

as

...

the

...

datastore

...

for

...

metadata,

...

put

...

MySQL

...

client

...

libraries

...

in

...

HIVE_HOME/lib

...

before

...

starting

...

Hive

...

Client

...

or

...

HiveMetastore

...

Server.

...

Metastore

...

Deployment

...

Options

...

in

...

Pictures

...

^metastore_usage.pptx

...