Replacing the Implementation of Hive CLI Using Beeline

Why Replace the Existing Hive CLI?

Hive CLI is a legacy tool which had two main use cases. The first is that it served as a thick client for SQL on Hadoop and the second is that it served as a command line tool for Hive Server (the original Hive server, now often referred to as "HiveServer1"). Hive Server has been deprecated (HIVE-6977) and removed from the Hive code base as of Hive 1.0.0 and replaced with HiveServer2 (HIVE-2935), so the second use case no longer applies. For the first use case, Beeline provides or is supposed to provide equal functionality, yet is implemented differently from Hive CLI.

Ideally, Hive CLI should be deprecated as the Hive community has long recommended using the Beeline + HiveServer2 configuration; however, because of the wide use of Hive CLI, we instead propose replacing Hive CLI's implementation with Beeline plus embedded HiveServer2 so that the Hive community only needs to maintain a single code path. In this way, Hive CLI is just an alias to Beeline at both the shell script level and the high code level. The goal is that no or minimal changes are required from existing user scripts using Hive CLI.

Hive CLI Functionality Support

We use Beeline to implement the Hive CLI functionality. In case some existing Hive CLI features are not supported in Beeline, we are able to use the following command to use the deprecated Hive CLI tool.

export USE_DEPRECATED_CLI=true

Note that the log4j configuration file has been changed to "beeline-log4j.properties".

Hive CLI Options Support

To get help, run "hive -H" or "hive --help".

usage: hive
 -d,--define <key=value>          Variable subsitution to apply to hive
                                  commands. e.g. -d A=B or --define A=B
    --database <databasename>     Specify the database to use
 -e <quoted-query-string>         SQL from command line
 -f <filename>                    SQL from files
 -H,--help                        Print help information
    --hiveconf <property=value>   Use value for given property
    --hivevar <key=value>         Variable subsitution to apply to hive
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)

Hive CLI Interactive Shell Commands Support

Example for source command:

hive> source /root/test.sql;
hive> show tables;
numbers_bucketed
test2
testavro2

Hive CLI Configuration Support

Configuration Name	Supported in New CLI
hive.cli.errors.ignore	Yes
hive.cli.prompt	Yes
hive.cli.pretty.output.num.cols	No
hive.cli.print.current.db	No

Performance Impacts

Using the JMH to measure the average time cost when retrieving a data set, we have the following results.

Benchmark                                                       Mode  Samples           Score   Error  Units
o.a.h.b.c.CliBench.BeeLineDriverBench.testSQLWithInitialFile    avgt        1  1713326099.000 ?  NaN  ns/op
o.a.h.b.c.CliBench.CliDriverBench.testSQLWithInitialFile        avgt        1  1852995786.000 ?  NaN  ns/op

The lower the score the better since we are evaluating the time cost. There is not a clear performance gap in terms of retrieving data.

Space shortcuts

Child pages