Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: updates & edits, with input from Thejas Nair and Sushanth Sowmyan

Hive

...

CLI

Table of Contents

$HIVE_HOME/bin/hive is a shell utility which can be used to run hive Hive queries in either interactive or batch mode.

Hive Command

...

Line Options

To get help, run "hive -H" or "hive --help".
Usage (as it is in Hive 0.9.0):

Code Block
usage: hive
 -d,--define <key=value>          Variable subsitutionsubstitution to apply to hive
                                  commands. e.g. -d A=B or --define A=B
 -e <quoted-query-string>         SQL from command line
 -f <filename>                    SQL from files
 -H,--help                        Print help information
 -h <hostname>                    connectingConnecting to Hive Server on remote host
    --hiveconf <property=value>   Use value for given property
    --hivevar <key=value>         Variable subsitutionsubstitution to apply to hive
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file
 -p <port>                        connectingConnecting to Hive Server on port number
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)
Info
titleVersion information

As of Hive 0.10.0 there is one additional command line option:

Code Block
--database               Specify the database to use
  • Example of running Query a query from the command line
    Code Block
       $HIVE_HOME/bin/hive -e 'select a.col from tab1 a'
       
  • Example of setting hive Hive configuration variables
    Code Block
       $HIVE_HOME/bin/hive -e 'select a.col from tab1 a' -hiveconf hive.exec.scratchdir=/home/my/hive_scratch  -hiveconf mapred.reduce.tasks=32
       
  • Example of dumping data out from a query into a file using silent mode
    Code Block
       HIVE$HIVE_HOME/bin/hive -S -e 'select a.col from tab1 a' > a.txt
       
  • Example of running a script non-interactively
    Code Block
       HIVE$HIVE_HOME/bin/hive -f /home/my/hive-script.sql
       
  • Example of running an initialization script before entering interactive mode
    Code Block
       HIVE$HIVE_HOME/bin/hive -i /home/my/hive-init.sql
       

...

The hiverc File

The CLI when invoked without the -i option will attempt to load $HIVE_HOME/bin/.hiverc and $HOME/.hiverc as initialization files.

Hive Batch Mode Commands

When $HIVE_HOME/bin/hive is run with the -e or -f option, it executes SQL commands in batch mode.

  • hive -e '<query-string>' executes the query string.
  • hive -f <filepath> executes one or more SQL queries from a file.

Examples are shown above.

Hive Interactive Shell Commands

When $HIVE_HOME/bin/hive is run without either the -e/ or -f option, it enters interactive shell mode.

Use ";" (semicolon) to terminate commands. Comments in scripts can be specified using the "--" prefix.

Command

Description

quit
exit

Use quit or exit to come out of leave the interactive shell.

reset

Resets the configuration to the default values (as of Hive 0.10: see HIVE-3202).

set <key>=<value>

Use this to set Sets the value of a particular configuration variable . One thing to note here is that if (key).
Note: If you misspell the variable name, cli the CLI will not show an error.

set

This will print Prints a list of configuration variables that are overridden by the user or hiveHive.

set -v

This will give all possible hadoop/hive configuration variables.

add FILE <value> <value>*

Adds a file to the list of resources.

list FILE

list all the resources already added

list FILE <value>*

Check given resources are already added or not.

! <cmd>

execute a shell command from hive shell

dfs <dfs command>

execute dfs command command from hive shell

Prints all Hadoop and Hive configuration variables.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="8c3418ad-09cd-4371-89c0-f6f21b55e979"><ac:plain-text-body><![CDATA[

add FILE[S] <filepath> <filepath>*
]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="3e23646c-ba66-4226-89cb-e9ef8a137b0d"><ac:plain-text-body><![CDATA[add JAR[S] <filepath> <filepath>*
]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="96e11df7-d41e-4507-9a4a-e082e5f80375"><ac:plain-text-body><![CDATA[add ARCHIVE[S] <filepath> <filepath>*

Adds one or more files, jars, or archives to the list of resources in the distributed cache.

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="aeba22b5-f0f5-4671-817f-e6fe5c63f9da"><ac:plain-text-body><![CDATA[

list FILE[S]
]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="6e0bc177-2647-441f-abd3-34789479774e"><ac:plain-text-body><![CDATA[list JAR[S]
]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="d1c980a3-1475-4e38-a9c7-62273e8824a9"><ac:plain-text-body><![CDATA[list ARCHIVE[S]

Lists the resources already added to the distributed cache.

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="1c122940-d1f4-4872-a5e8-c872be080858"><ac:plain-text-body><![CDATA[

list FILE[S] <filepath>*
]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="b25151d9-5af8-46cc-bce2-42cf0ea75c5b"><ac:plain-text-body><![CDATA[list JAR[S] <filepath>*
]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="3803e7b8-1e61-4783-98c3-f9bbf5f14637"><ac:plain-text-body><![CDATA[list ARCHIVE[S] <filepath>*

Checks whether the given resources are already added to the distributed cache or not.

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="931b5b01-bb7d-4182-be97-3dd3a3b8e1b6"><ac:plain-text-body><![CDATA[

delete FILE[S] <filepath>*
]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="d4cc23db-d4d6-48c5-b7a6-567f0cc9e346"><ac:plain-text-body><![CDATA[delete JAR[S] <filepath>*
]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="64982375-edb9-4e28-a2cf-b6052aaa9316"><ac:plain-text-body><![CDATA[delete ARCHIVE[S] <filepath>*

Removes the resource(s) from the distributed cache.

]]></ac:plain-text-body></ac:structured-macro>

! <command>

Executes a shell command from the Hive shell.

dfs <dfs command>

Executes a dfs command from the Hive shell.

<query string>

Executes a Hive query and prints results to standard output.

source FILE <filepath>

Executes a script file inside the CLI.

<query string>

executes hive query and prints results to stdout

Sample Usage:

Code Block
  hive> set mapred.reduce.tasks=32;
  hive> set;
  hive> select a.* from tab1;
  hive> !ls;
  hive> dfs -ls;

...

Hive uses log4j for logging. These logs are not emitted to the standard output by default but are instead captured to a log file specified by Hive's log4j properties file. By default Hive will use hive-log4j.default in the conf/ directory of the hive Hive installation which writes out logs to /tmp/<userid>/hive.log and uses the WARN level.

It is often desirable to emit the logs to the standard output and/or change the logging level for debugging purposes. These can be done from the command line as follows:

Code Block
 
 $HIVE_HOME/bin/hive -hiveconf hive.root.logger=INFO,console 

hive.root.logger specifies the logging level as well as the log destination. Specifying console as the target sends the logs to the standard error (instead of the log file).

...

Hive can manage the addition of resources to a session where those resources need to be made available at query execution time. The resources can be files, jars, or archives. Any locally accessible file can be added to the session.

Once a file resource is added to a session, hive query Hive queries can refer to this file it by its name (in map/reduce/transform clauses) and this file the resource is available locally at execution time on the entire hadoop Hadoop cluster. Hive uses Hadoop's Distributed Cache to distribute the added files resources to all the machines in the cluster at query execution time.

...

Code Block
   ADD { FILE[S] | JAR[S] | ARCHIVE[S] } <filepath1> [<filepath2>]*
   LIST { FILE[S] | JAR[S] | ARCHIVE[S] } [<filepath1> <filepath2> ..]
   DELETE { FILE[S] | JAR[S] | ARCHIVE[S] } [<filepath1> <filepath2> ..]
 
  • FILE resources are just added to the distributed cache. Typically, this might be something like a transform script to be executed.
  • JAR resources are also added to the Java classpath. This is required in order to reference objects they contain such as UDF'sUDFs.
  • ARCHIVE resources are automatically unarchived as part of distributing them.

...

Code Block
  hive> add FILE /tmp/tt.py;
  hive> list FILES;
  /tmp/tt.py
  hive> select from networks a 
               MAP a.networkid 
               USING 'python tt.py' as nn where a.ds = '2009-01-04' limit  10;
 

It is not neccessary to add files to the session if the files used in a transform script are already available on all machines in the hadoop Hadoop cluster using the same path name. For example:

  • ... MAP a.networkid USING 'wc -l' ...: here
    Here wc is an executable available on all machines.
  • ... MAP a.networkid USING '/home/nfsserv1/hadoopscripts/tt.py' ...: here
    Here tt.py may be accessible via a nfs an NFS mount point that's configured identically on all the cluster nodes.