Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: minor edits in Hive CLI Functionality Support

Table of Contents

Why

...

Replace the

...

Existing Hive CLI?

Hive CLI is a legacy tool which had two main use cases. One The first is that it served as a thick client for SQL on Hadoop and another the second is that it served as a command line tool for Hive Server (the original Hive server, now often referred to as "HiveServer1. HiveServer1 is already "). Hive Server has been deprecated and removed from the Hive code base as of Hive 1.0.0 (HIVE-6977) and replaced with HiveServer2 (HIVE-2935), so the second use case #2 is out of the questionno longer applies. For #1the first use case, Beeline provides or is supposed to provide equal functionality, yet is implemented differently from Hive CLI.

As it has been a while that Hive community has been recommending Beeline + HS2 configuration, ideally we should deprecating Hive CLI. Because of Ideally, Hive CLI should be deprecated as the Hive community has long recommended using the Beeline plus HiveServer2 configuration; however, because of the wide use of Hive CLI, we instead propose are replacing Hive CLI's implementation with a new Hive CLI on top of Beeline plus embedded HS2 HiveServer2 (HIVE-10511) so that the Hive community only needs to maintain a single code path. In this way, the new Hive CLI is just an alias to Beeline at either both the shell script level or at and the high code level. The goal is that no changes or minimum minimal changes are expected required from existing user scrip scripts using Hive CLI.

...

Hive CLI

...

Functionality Support

We use beeline a new Hive CLI on top of Beeline to implement the old Hive CLI functionality. In case Since some existing Hive CLI features are not supported in the new replaced Hive CLI, we are able to use using the old Hive client implementation by default. Use the following command to use the deprecated specify the new Beeline-based Hive CLI tool.:

No Format
export USE_DEPRECATED_CLI=truefalse

And Note that the log4j configuration file has been changed to "beeline-log4j.properties". 

Hive CLI

...

Options Support

To get help, run "hive -H" or "hive --help".

No Format
usage: hive
 -d,--define <key=value>          Variable subsitution to apply to hive
                                  commands. e.g. -d A=B or --define A=B
    --database <databasename>     Specify the database to use
 -e <quoted-query-string>         SQL from command line
 -f <filename>                    SQL from files
 -H,--help                        Print help information
    --hiveconf <property=value>   Use value for given property
    --hivevar <key=value>         Variable subsitution to apply to hive
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)

Examples

  • Example of running a query from the command line

    No Format
    $HIVE_HOME/bin/hive -e 'select a.foo from pokes a'
  • Example of setting Hive configuration variables

    No Format
    $HIVE_HOME/bin/hive -e 'select a.foo from pokes a' --hiveconf hive.exec.scratchdir=/opt/my/hive_scratch --hiveconf mapred.reduce.tasks=1
  • Example of dumping data out from a query into a file using silent mode

    No Format
    $HIVE_HOME/bin/hive -S -e 'select a.foo from pokes a' > a.txt
  • Example of running a script non-interactively from local disk

    No Format
    $HIVE_HOME/bin/hive -f /home/my/hive-script.sql
  • Example of running a script non-interactively from a Hadoop supported filesystem (starting in Hive 0.14)

    No Format
    $HIVE_HOME/bin/hive -f hdfs://<namenode>:<port>/hive-script.sql

CLI Interactive Shell Commands support

Hive CLI Interactive Shell Commands Support

When $HIVE_HOME/bin/hive is run without either the -e or -f option, it enters interactive shell mode.

Use ";" (semicolon) to terminate commands. Comments in scripts can be specified using the "--" prefix.

Command

Description

quit
exit

Use quit or exit to leave the interactive shell.

reset

Resets the configuration to the default values (as of Hive 0.10: see HIVE-3202).

set <key>=<value>

Sets the value of a particular configuration variable (key).
Note: If you misspell the variable name, the CLI will not show an error.

set

Prints a list of configuration variables that are overridden by the user or Hive.

set -v

Prints all Hadoop and Hive configuration variables.

add FILE[S] <filepath> <filepath>*
add JAR[S] <filepath> <filepath>*
add ARCHIVE[S] <filepath> <filepath>*

Adds one or more files, jars, or archives to the list of resources in the distributed cache. See Hive Resources for more information.

add FILE[S] <ivyurl> <ivyurl>*
add JAR[S] <ivyurl> <ivyurl>*
add ARCHIVE[S] <ivyurl> <ivyurl>*
As of Hive 1.2.0, adds one or more files, jars or archives to the list of resources in the distributed cache using an Ivy URL of the form ivy://group:module:version?query_string. See Hive Resources  for more information.

list FILE[S]
list JAR[S]
list ARCHIVE[S]

Lists the resources already added to the distributed cache. See Hive Resources  for more information.

list FILE[S] <filepath>*
list JAR[S] <filepath>*
list ARCHIVE[S] <filepath>*

Checks whether the given resources are already added to the distributed cache or not. See Hive Resources  for more information.

delete FILE[S] <filepath>*
delete JAR[S] <filepath>*
delete ARCHIVE[S] <filepath>*

Removes the resource(s) from the distributed cache.

delete FILE[S] <ivyurl> <ivyurl>*
delete JAR[S] <ivyurl> <ivyurl>*
delete ARCHIVE[S] <ivyurl> <ivyurl>*

As of Hive 1.2.0, removes the resource(s) which were added using the <ivyurl> from the distributed cache. See Hive Resources for more information.

! <command>

Executes a shell command from the Hive shell.

dfs <dfs command>

Executes a dfs command from the Hive shell.

<query string>

Executes a Hive query and prints results to standard output.

source FILE <filepath>

Executes a script file inside the CLI.

Examples of shell commandsExample for source command:

No Format
hive> source /root/test.sql;
hive> show tables;
numbers_bucketed
test2
testavro2

...

;
test1
test2
hive> exit;
hive> quit;
hive> set;
hive> set hive.cli.print.header=true;
hive> set -v;
hive> reset;
hive> add file /opt/a.txt;
Added resources: [/opt/a.txt]
hive> list files;
/opt/a.txt
hive> delete file /opt/a.txt;
hive> add jar /usr/share/vnc/classes/vncviewer.jar;
Added [/usr/share/vnc/classes/vncviewer.jar]to class path
Added resources:[/usr/share/vnc/classes/vncviewer.jar]
hive> list jars;
/usr/share/vnc/classes/vncviewer.jar
hive> delete jar /usr/share/vnc/classes/vncviewer.jar;
hive> !ls;
bin
conf
hive> dfs -ls / ;
Found 2 items
drwx-wx-wx  - root supergroup  0   2015-08-12 19:06 /tmp
drwxr-xr-x  - root supergroup  0   2015-08-12 19:43 /user
hive> select * from pokes; 
OK
pokes.foo   pokes.bar
238         val_238
86          val_86
311         val_311
hive>source /opt/s.sql;

Hive CLI Configuration Support

Configuration NameSupported in New Hive CLIDescription
hive.cli.print.headerYes

Whether to print the names of the columns in query output. HIVE-11624

hive.cli.errors.ignoreYesWhether to force execution of a script when errors occurred. HIVE-11191
hive.cli.promptYesCommand line prompt configuration value. Other hiveconf can be used in this configuration value. HIVE-11226
hive.cli.pretty.output.num.colsNoYesThe number of columns to use when formatting output generated by the DESCRIBE PRETTY table_name command. HIVE-11779
hive.cli.print.current.dbNoYesWhether to include the current database in the Hive prompt. HIVE-11637

Performance Impacts

Using the JMH to measure the average time cost when retrieving a data set, we have the following resultresults.

No Format
Benchmark                                                       Mode  Samples           Score   Error  Units
o.a.h.b.c.CliBench.BeeLineDriverBench.testSQLWithInitialFile    avgt        1  1713326099.000 ?  NaN  ns/op
o.a.h.b.c.CliBench.CliDriverBench.testSQLWithInitialFile        avgt        1  1852995786.000 ?  NaN  ns/op

The lower the score is the less the better since we are evaluate evaluating the time cost time. And we didn't have a . There is no clear performance gap in terms of retrieving data.

...