Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: change single-dash hiveconf to double-dash per discussion in dev@hive "--hiveconf vs -hiveconf" March 7-8, 2014, add note about single-dash hiveconf being supported

Hive CLI

Table of Contents

$HIVE_HOME/bin/hive is a shell utility which can be used to run Hive queries in either interactive or batch mode.

...

To get help, run "hive -H" or "hive --help".
Usage (as it is in Hive 0.9.0):

No Format

usage: hive
 -d,--define <key=value>          Variable substitution to apply to hive
                                  commands. e.g. -d A=B or --define A=B
 -e <quoted-query-string>         SQL from command line
 -f <filename>                    SQL from files
 -H,--help                        Print help information
 -h <hostname>                    Connecting to Hive Server on remote host
    --hiveconf <property=value>   Use value for given property
    --hivevar <key=value>         Variable substitution to apply to hive
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file
 -p <port>                        Connecting to Hive Server on port number
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)
Info
titleVersion information

As of Hive 0.10.0 there is one additional command line option:

No Format

--database <dbname>      Specify the database to use

Note: The variant "-hiveconf" is supported as well as "--hiveconf".

Examples

See Variable Substitution for examples of using the hiveconf option.

  • Example of running a query from the command line

    No Format
    
       $HIVE_HOME/bin/hive -e 'select a.col from tab1 a'
       
  • Example of setting Hive configuration variables

    No Format
    
       $HIVE_HOME/bin/hive -e 'select a.col from tab1 a' --hiveconf hive.exec.scratchdir=/home/my/hive_scratch  --hiveconf mapred.reduce.tasks=32
       
  • Example of dumping data out from a query into a file using silent mode

    No Format
    
       $HIVE_HOME/bin/hive -S -e 'select a.col from tab1 a' > a.txt
       
  • Example of running a script non-interactively

    No Format
    
       $HIVE_HOME/bin/hive -f /home/my/hive-script.sql
       
  • Example of running an initialization script before entering interactive mode

    No Format
    
       $HIVE_HOME/bin/hive -i /home/my/hive-init.sql
       

...

Command

Description

quit
exit

Use quit or exit to leave the interactive shell.

reset

Resets the configuration to the default values (as of Hive 0.10: see HIVE-3202).

set <key>=<value>

Sets the value of a particular configuration variable (key).
Note: If you misspell the variable name, the CLI will not show an error.

set

Prints a list of configuration variables that are overridden by the user or Hive.

set -v

Prints all Hadoop and Hive configuration variables.

add FILE[S] <filepath> <filepath>*
add JAR[S] <filepath> <filepath>*
add ARCHIVE[S] <filepath> <filepath>*

Adds one or more files, jars, or archives to the list of resources in the distributed cache.

list FILE[S]
list JAR[S]
list ARCHIVE[S]

Lists the resources already added to the distributed cache.

list FILE[S] <filepath>*
list JAR[S] <filepath>*
list ARCHIVE[S] <filepath>*

Checks whether the given resources are already added to the distributed cache or not.

delete FILE[S] <filepath>*
delete JAR[S] <filepath>*
delete ARCHIVE[S] <filepath>*

Removes the resource(s) from the distributed cache.

! <command>

Executes a shell command from the Hive shell.

dfs <dfs command>

Executes a dfs command from the Hive shell.

<query string>

Executes a Hive query and prints results to standard output.

source FILE <filepath>

Executes a script file inside the CLI.

Sample Usage:

No Format

  hive> set mapred.reduce.tasks=32;
  hive> set;
  hive> select a.* from tab1;
  hive> !ls;
  hive> dfs -ls;

...

It is often desirable to emit the logs to the standard output and/or change the logging level for debugging purposes. These can be done from the command line as follows:

No Format

 $HIVE_HOME/bin/hive --hiveconf hive.root.logger=INFO,console

...

Once a resource is added to a session, Hive queries can refer to it by its name (in map/reduce/transform clauses) and the resource is available locally at execution time on the entire Hadoop cluster. Hive uses Hadoop's Distributed Cache to distribute the added resources to all the machines in the cluster at query execution time.

Usage:

No Format

   ADD { FILE[S] | JAR[S] | ARCHIVE[S] } <filepath1> [<filepath2>]*
   LIST { FILE[S] | JAR[S] | ARCHIVE[S] } [<filepath1> <filepath2> ..]
   DELETE { FILE[S] | JAR[S] | ARCHIVE[S] } [<filepath1> <filepath2> ..] 
  • FILE resources are just added to the distributed cache. Typically, this might be something like a transform script to be executed.
  • JAR resources are also added to the Java classpath. This is required in order to reference objects they contain such as UDFs.
  • ARCHIVE resources are automatically unarchived as part of distributing them.

Example:

No Format

  hive> add FILE /tmp/tt.py;
  hive> list FILES;
  /tmp/tt.py
  hive> select from networks a 
               MAP a.networkid 
               USING 'python tt.py' as nn where a.ds = '2009-01-04' limit 10;

...