Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add overview of how to use JDBC (thanks, Thejas) and fix a typo

...

HiveServer2 has a new JDBC driver.  It It supports both embedded and remote access to HiveServer2.

Connection URL for Remote or Embedded Mode

The JDBC connection URL format has the prefix jdbc:hive2:// and the Driver class is org.apache.hive.jdbc.HiveDriver. Note that this is different from the old HiveServer.

  • For a remote server, the URL format is jdbc:hive2://<host>:<port>/<db> (default port for HiveServer2 is 10000).
  • For an embedded server, the URL format is jdbc:hive2:// (no host or port).

Using JDBC

You can use JDBC to access data stored in a relational database or other tabular format.

  1. Load the HiveServer2 JDBC driver.

    For example:
    Code Block
    
    Class.forName("org.apache.hive.jdbc.HiveDriver");
    
  2. Connect to the database by creating a Connection object with the JDBC driver.

    For example:
    Code Block
    
    Connection cnct = DriverManager.getConnection("jdbc:hive2://<host>:<port>", "<user>", "<password>");
    
    The default <port> is 10000. In non-secure configurations, specify a <user> for the query to run as. The <password> field value is ignored in non-secure mode.
    Code Block
    
    Connection cnct = DriverManager.getConnection("jdbc:hive2://<host>:<port>", "<user>", "");
    
    In Kerberos secure mode, the user information is based on the Kerberos credentials.

  3. Submit SQL to the database by creating a Statement object and using its executeQuery() method.

    For example:
    Code Block
    
    Statement stmt = cnct.createStatement();
    ResultSet rset = stmt.executeQuery("SELECT foo FROM bar");
    
  4. Process the result set, if necessary.

These steps are illustrated in the sample code below.

JDBC Client Sample Code

Code Blocknoformat


import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveJdbcClient {
  private static String driverName = "org.apache.hive.jdbc.HiveDriver";

  /**
 * @param args
 * @throws SQLException
   */
  public static void main(String[] args) throws SQLException {
      try {
      Class.forName(driverName);
    } catch (ClassNotFoundException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
      System.exit(1);
    }
    //replace "hive" here with the name of the user the queries should run as
    Connection con = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "hive", "");
    Statement stmt = con.createStatement();
    String tableName = "testHiveDriverTable";
    stmt.execute("drop table if exists " + tableName);
    stmt.execute("create table " + tableName + " (key int, value string)");
    // show tables
    String sql = "show tables '" + tableName + "'";
    System.out.println("Running: " + sql);
    ResultSet res = stmt.executeQuery(sql);
    if (res.next()) {
      System.out.println(res.getString(1));
    }
       // describe table
    sql = "describe " + tableName;
    System.out.println("Running: " + sql);
    res = stmt.executeQuery(sql);
    while (res.next()) {
      System.out.println(res.getString(1) + "\t" + res.getString(2));
    }

    // load data into table
    // NOTE: filepath has to be local to the hive server
    // NOTE: /tmp/a.txt is a ctrl-A separated file with two fields per line
    String filepath = "/tmp/a.txt";
    sql = "load data local inpath '" + filepath + "' into table " + tableName;
    System.out.println("Running: " + sql);
    stmt.execute(sql);

    // select * query
    sql = "select * from " + tableName;
    System.out.println("Running: " + sql);
    res = stmt.executeQuery(sql);
    while (res.next()) {
      System.out.println(String.valueOf(res.getInt(1)) + "\t" + res.getString(2));
    }

    // regular hive query
    sql = "select count(1) from " + tableName;
    System.out.println("Running: " + sql);
    res = stmt.executeQuery(sql);
    while (res.next()) {
      System.out.println(res.getString(1));
    }
  }
}

Running the JDBC Sample Code

Code Blocknoformat
{noformat}
# Then on the command-line
$ javac HiveJdbcClient.java

# To run the program in using remote hiveserver in non-kerberos mode, we need the following jars in the classpath
# from hive/build/dist/lib
#     hive-jdbc*.jar
#     hive-service*.jar
#     libfb303-0.9.0.jar#  libthrift-0.9.0.jar# log4j-1.2.16.jar# slf4j-api-1.6.1.jar# slf4j-log4j12-1.6.1.jar# commons-logging-1.0.4.jar#
#
# Following additional jars are needed for the kerberos secure mode -
#     hive-exec*.jar
#     commons-configuration-1.6.jar
#  and from hadoop - hadoop-*core.jar
# To run the program in embedded mode, we need the following additional jars in the classpath
# from hive/build/dist/lib
#     hive-exec*.jar
#     hive-metastore*.jar
#     antlr-runtime-3.0.1.jar
#     derby.jar
#     jdo2-api-2.1.jar
#     jpox-core-1.2.2.jar
#     jpox-rdbms-1.2.2.jar
#
# from hadoop/build
#     hadoop-*-core.jar
# as well as hive/build/dist/conf, any HIVE_AUX_JARS_PATH set, and hadoop jars necessary to run MR jobs (eg lzo codec)

$ java -cp $CLASSPATH HiveJdbcClient

# Alternatively, you can run the following bash script, which will seed the data file
# and build your classpath before invoking the client. The script adds all the 
# additional jars needed for using HiveServer2 in embedded mode as well. 

#!/bin/bash
HADOOP_HOME=/your/path/to/hadoop
HIVE_HOME=/your/path/to/hive

echo -e '1\x01foo' > /tmp/a.txt
echo -e '2\x01bar' >> /tmp/a.txt

HADOOP_CORE={{ls $HADOOP_HOME/hadoop-*-core.jar}}
CLASSPATH=.:$HIVE_HOME/conf:`hadoop classpath`

for i in ${HIVE_HOME}/lib/*.jar ; do
    CLASSPATH=$CLASSPATH:$i
done

java -cp $CLASSPATH HiveJdbcClient


{noformat}

JDBC Client Setup for a Secure Cluster

...