Why do we replace the existing Hive CLI?
Hive CLI is a legacy tool which had two main use cases. One is served as a thick client for SQL on Hadoop and another is as a command line tool for HiveServer1. HiveServer1 is already deprecated and removed from Hive code base, so use case #2 is out of the question. For #1, Beeline provides or is supposed to provide equal functionality, yet is implemented differently from Hive CLI.
As it has been a while that Hive community has been recommending Beeline + HS2 configuration, ideally we should deprecating Hive CLI. Because of wide use of Hive CLI, we instead propose replacing Hive CLI's implementation with Beeline plus embedded HS2 so that Hive community only needs to maintain a single code path. In this way, Hive CLI is just an alias to Beeline at either shell script level or at high code level. The goal is that no changes or minimum changes are expected from existing user scrip using Hive CLI.
Old CLI functionality support
We use beeline to implement the old CLI functionality. In case some existing CLI features are not supported in new replaced CLI, we are able to use the following command to use the deprecated CLI tool.
export USE_DEPRECATED_CLI=true
And the log4j configuration file has been changed to "beeline-log4j.properties".
CLI options support
To get help, run "hive -H
" or "hive --help
".
usage: hive -d,--define <key=value> Variable subsitution to apply to hive commands. e.g. -d A=B or --define A=B --database <databasename> Specify the database to use -e <quoted-query-string> SQL from command line -f <filename> SQL from files -H,--help Print help information --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable subsitution to apply to hive commands. e.g. --hivevar A=B -i <filename> Initialization SQL file -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console)
CLI Interactive Shell Commands support
Example for source command:
hive> source /root/test.sql; hive> show tables; numbers_bucketed test2 testavro2
CLI configuration support
Configuration Name | Supported in New CLI |
---|---|
hive.cli.errors.ignore | Yes |
hive.cli.prompt | Yes |
hive.cli.pretty.output.num.cols | No |
hive.cli.print.current.db | No |
Performance Impacts
Using the JMH to measure the average time cost when retrieving a data set, we have the following result.
Benchmark Mode Samples Score Error Units o.a.h.b.c.CliBench.BeeLineDriverBench.testSQLWithInitialFile avgt 1 1713326099.000 ? NaN ns/op o.a.h.b.c.CliBench.CliDriverBench.testSQLWithInitialFile avgt 1 1852995786.000 ? NaN ns/op
The score is the less the better since we are evaluate the cost time. And we didn't have a clear performance gap in terms of retrieving data.