...
docker exec -it hiveserver2 beeline -u 'jdbc:hive2://hiveserver2:10000/'
Note: Launch Standalone Metastore To use standalone Metastore with Derby,
docker run -d -p 9083:9083 --env SERVICE_NAME=metastore --name metastore-standalone apache/hive:${HIVE_VERSION}
...
docker run -d -p 9083:9083 --env SERVICE_NAME=metastore \ --env DB_DRIVER=postgres \
jdbc:postgresql://postgres:5432/metastore_db
--env SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=org.postgresql.Driver -Djavax.jdo.option.ConnectionURL=-Djavax.jdo.option.ConnectionUserName=hive -Djavax.jdo.option.ConnectionPassword=password" \
--mount source=warehouse,target=/opt/hive/data/warehouse \
--mount type=bind,source=`mvn help:evaluate -Dexpression=settings.localRepository -q -DforceStdout`/org/postgresql/postgresql/42.5.1/postgresql-42.5.1.jar,target=/opt/hive/lib/postgres.jar \
--name metastore-standalone apache/hive:4.0.0
If you want to use your own hdfs-site.xml
or yarn-site.xml
for the service, you can provide the environment variable HIVE_CUSTOM_CONF_DIR
for the command. For instance, put the custom configuration file under the directory /opt/hive/conf
, then run,
docker run -d -p 9083:9083 --env SERVICE_NAME=metastore \ --env DB_DRIVER=postgres \ -v /opt/hive/conf:/hive_custom_conf --env HIVE_CUSTOM_CONF_DIR=/hive_custom_conf \ --mount type=bind,source=`mvn help:evaluate -Dexpression=settings.localRepository -q -DforceStdout`/org/postgresql/postgresql/42.5.1/postgresql-42.5.1.jar,target=/opt/hive/lib/postgres.jar \ --name metastore apache/hive:4.0.0
For Hive releases before 4.0, if you want to upgrade the existing external Metastore schema to the target version, then add --env SCHEMA_COMMAND=upgradeSchema to the command.
To skip schematool initialisation or upgrade for metastore use --envIS_RESUME="true"
, and for verbose logging set --envVERBOSE="true".
- HiveServer2
Launch the HiveServer2 with an embedded Metastore,
...
To save the data between container restarts, you can start the HiveServer2 with a Volume,
docker run -d -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2 \
--env SERVICE_OPTS="-Dhive.metastore.uris=thrift://metastore:9083" \
--mount source=warehouse,target=/opt/hive/data/warehouse \
--env IS_RESUME="true" \
--name hiveserver2 apache/hive:4.0.0
- HiveServer2, Metastore
To get a quick overview of both HiveServer2 and Metastore, there is a docker-compose.yml
placed under packaging/src/docker
for this purpose, specify the POSTGRES_LOCAL_PATH
first:
export POSTGRES_LOCAL_PATH=your_local_path_to_postgres_driver
Example:
mvn dependency:copy -Dartifact="org.postgresql:postgresql:42.5.1" && \ export POSTGRES_LOCAL_PATH=`mvn help:evaluate -Dexpression=settings.localRepository -q -DforceStdout`/org/postgresql/postgresql/42.5.1/postgresql-42.5.1.jar
If you don't install maven or have problem in resolving the postgres driver, you can always download this jar yourself, change the POSTGRES_LOCAL_PATH
to the path of the downloaded jar.
Then,
docker compose up -d
HiveServer2, Metastore and Postgres services will be started as a consequence. Volumes are used to persist data generated by Hive inside Postgres and HiveServer2 containers,
hive_db
The volume persists the metadata of Hive tables inside Postgres container.
warehouse
The volume stores tables' files inside HiveServer2 container.
To stop/remove them all,
docker compose down
Usage
- HiveServer2 web
- Accessed on browser at http://localhost:10002/
- Beeline:
docker exec -it hiveserver2 beeline -u 'jdbc:hive2://hiveserver2:10000/' # If beeline is installed on host machine, HiveServer2 can be simply reached via: beeline -u 'jdbc:hive2://localhost:10000/'
- Run some queries
show tables; create table hive_example(a string, b int) partitioned by(c int); alter table hive_example add partition(c=1); insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3); select count(distinct a) from hive_example; select sum(b) from hive_example;