...
docker run -d -p 9083:9083 --env SERVICE_NAME=metastore --env DB_DRIVER=postgres \
jdbc:postgresql://postgres:5432/metastore_db
--env SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=org.postgresql.Driver -Djavax.jdo.option.ConnectionURL=-Djavax.jdo.option.ConnectionUserName=hive -Djavax.jdo.option.ConnectionPassword=password" \
--mount source=warehouse,target=/opt/hive/data/warehouse \
--mount type=bind,source=`mvn help:evaluate -Dexpression=settings.localRepository -q -DforceStdout`/org/postgresql/postgresql/42.5.1/postgresql-42.5.1.jar,target=/opt/hive/lib/postgres.jar \
--name metastore-standalone apache/hive:4.0.0
...
docker run -d -p 9083:9083 --env SERVICE_NAME=metastore --env DB_DRIVER=postgres \ -v /opt/hive/conf:/hive_custom_conf --env HIVE_CUSTOM_CONF_DIR=/hive_custom_conf \ --mount type=bind,source=`mvn help:evaluate -Dexpression=settings.localRepository -q -DforceStdout`/org/postgresql/postgresql/42.5.1/postgresql-42.5.1.jar,target=/opt/hive/lib/postgres.jar \ --name metastore apache/hive:4.0.0
For Hive releases before 4.0, if you want to upgrade the existing external Metastore schema to the target version, then add "--env SCHEMA_COMMAND=upgradeSchema" to the command.
To skip schematool initialisation or upgrade for metastore use "--envIS_RESUME=true"
, and for verbose logging set "--envVERBOSE=true".
- HiveServer2
Launch the HiveServer2 with an embedded Metastore,
...
docker run -d -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2 \
--env SERVICE_OPTS="-Dhive.metastore.uris=thrift://metastore:9083" \
--mount source=warehouse,target=/opt/hive/data/warehouse \
--env IS_RESUME="true" \
--name hiveserver2 apache/hive:4.0.0
- HiveServer2, Metastore
To get a quick overview of both HiveServer2 and Metastore, there is a docker-compose.yml
placed under packaging/src/docker
for this purpose, specify the POSTGRES_LOCAL_PATH
first:
export POSTGRES_LOCAL_PATH=your_local_path_to_postgres_driver
Example:
mvn dependency:copy -Dartifact="org.postgresql:postgresql:42.5.1" && \ export POSTGRES_LOCAL_PATH=`mvn help:evaluate -Dexpression=settings.localRepository -q -DforceStdout`/org/postgresql/postgresql/42.5.1/postgresql-42.5.1.jar
If you don't install maven or have problem in resolving the postgres driver, you can always download this jar yourself, change the POSTGRES_LOCAL_PATH
to the path of the downloaded jar.
Then,
docker compose up -d
HiveServer2, Metastore and Postgres services will be started as a consequence. Volumes are used to persist data generated by Hive inside Postgres and HiveServer2 containers,
hive_db
The volume persists the metadata of Hive tables inside Postgres container.
warehouse
The volume stores tables' files inside HiveServer2 container.
To stop/remove them all,
docker compose down
Usage
- HiveServer2 web
- Accessed on browser at http://localhost:10002/
- Beeline:
docker exec -it hiveserver2 beeline -u 'jdbc:hive2://hiveserver2:10000/' # If beeline is installed on host machine, HiveServer2 can be simply reached via: beeline -u 'jdbc:hive2://localhost:10000/'
- Run some queries
show tables; create table hive_example(a string, b int) partitioned by(c int); alter table hive_example add partition(c=1); insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3); select count(distinct a) from hive_example; select sum(b) from hive_example;