Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add link to junit.org, link "Running unit tests" to Dev FAQ, misc. edits

...

  • trunk/conf - This directory contains the packaged hive-default.xml and hive-site.xml.
  • trunk/data - This directory contains some data sets and configurations used in the Hive tests.
  • trunk/ivy - This directory contains the Ivy files used by the build infrastructure to manage dependencies on different Hadoop versions.
  • trunk/lib - This directory contains the run time libraries needed by Hive.
  • trunk/testlibs - This directory contains the junit.jar used by the junit JUnit target in the build infrastructure.
  • trunk/testutils (Deprecated)

...

Note
titleAnt to Maven

As of version 0.13 Hive uses Maven instead of Ant for its build. The following instructions are not up to date.

See the Hive Developer FAQ for updated instructions. The following instructions are not currently up to date.

Hive can be made to compile against different versions of Hadoop.

...

Layout of the unit tests

Hive uses junit JUnit for unit tests. Each of the 3 main components of Hive have their unit test implementations in the corresponding src/test directory e.g. trunk/metastore/src/test has all the unit tests for metastore, trunk/serde/src/test has all the unit tests for serde and trunk/ql/src/test has all the unit tests for the query processor. The metastore and serde unit tests provide the TestCase implementations for junitJUnit. The query processor tests on the other hand are generated using Velocity. The main directories under trunk/ql/src/test that contain these tests and the corresponding results are as follows:

  • Test Queries:
    • queries/clientnegative - This directory contains the query files (.q files) for the negative test cases. These are run through the CLI classes and therefore test the entire query processor stack.
    • queries/clientpositive - This directory contains the query files (.q files) for the positive test cases. Thesre are run through the CLI classes and therefore test the entire query processor stack.
    • qureies/positive (Will be deprecated) - This directory contains the query files (.q files) for the positive test cases for the compiler. These only test the compiler and do not run the execution code.
    • queries/negative (Will be deprecated) - This directory contains the query files (.q files) for the negative test cases for the compiler. These only test the compiler and do not run the execution code.
  • Test Results:
    • results/clientnegative - The expected results from the queries in queries/clientnegative.
    • results/clientpositive - The expected results from the queries in queries/clientpositive.
    • results/compiler/errors - The expected results from the queries in queries/negative.
    • results/compiler/parse - The expected Abstract Syntax Tree output for the queries in queries/positive.
    • results/compiler/plan - The expected query plans for the queries in queries/positive.
  • Velocity Templates to Generate the testsTests:
    • templates/TestCliDriver.vm - Generates the tests from queries/clientpositive.
    • templates/TestNegativeCliDriver.vm - Generates the tests from queries/clientnegative.
    • templates/TestParse.vm - Generates the tests from queries/positive.
    • templates/TestParseNegative.vm - Generates the tests from queries/negative.

Tables in the unit tests

Running unit tests

Run all tests:

Code Block
ant package test

Run all positive test queries:

Note
titleAnt to Maven

As of version 0.13 Hive uses Maven instead of Ant for its build. The following instructions are not up to date.

See the Hive Developer FAQ for updated instructions.

Run all tests:

Code Block
ant package test

Run all positive test queries:

Code Block
ant test 
Code Block
ant test -Dtestcase=TestCliDriver

Run a specific positive test query:

...

Anchor
DebuggingHiveCode
DebuggingHiveCode

Hive code includes both client-side code (e.g., compiler, semantic analyzer, and optimizer of HiveQL) and server-side code (e.g., operator/task/SerDe implementations). Debugging is different for client-side and server-side code, as described below.

Debugging Client-Side Code

The client-side code are running runs on your local machine so you can easily debug it using Eclipse the same way as you debug a any regular local Java code. Here are the steps to debug code within a unit test.

  • Make sure that you have run ant model-jar in hive/metastore and ant gen-test in hive since the last time you ran ant clean.
  • To run all of the unit tests for the Cli, open CLI:
    • Open up TestCliDriver.java
    • click Click Run->Debug Configurations, select TestCliDriver, and click Debug.
  • To run a single test within TestCliDriver.java:
    • Begin running the whole TestCli suite as before.
    • Once it finishes the setup and starts executing the JUnit tests, stop the test execution.
    • Find the desired test in the JUnit pane,
    • Right click on that test and select Debug.

Debugging Server-Side Code

The server-side code is distributed and running runs on the Hadoop cluster, so debugging server-side Hive code is a little bit complicated. In addition to printing to log files using log4j, you can also attach the debugger to a different JVM under unit test (single machine mode). Below are the steps on how to debug on server-side code.

  • Compile Hive code with javac.debug=on. Under Hive checkout directory:

    Code Block
        > ant -Djavac.debug=on package
    

    If you have already built Hive without javac.debug=on, you can clean the build and then run the above command.

    Code Block
        > ant clean  # not necessary if the first time to compile
        > ant -Djavac.debug=on package
    
  • Run ant test with additional options to tell the Java VM that is running Hive server-side code to wait for the debugger to attach. First define some convenient macros for debugging. You can put it in your .bashrc or .cshrc.

    Code Block
        > export HIVE_DEBUG_PORT=8000
        > export HIVE_DEBUG="-Xdebug -Xrunjdwp:transport=dt_socket,address=${HIVE_DEBUG_PORT},server=y,suspend=y"
    

    In particular HIVE_DEBUG_PORT is the port number that the JVM is listening on and the debugger will attach to. Then run the unit test as follows:

    Code Block
        > export HADOOP_OPTS=$HIVE_DEBUG
        > ant test -Dtestcase=TestCliDriver -Dqfile=<mytest>.q
    

    The unit test will run until it shows:

    Code Block
         [junit] Listening for transport dt_socket at address: 8000
    
  • Now, you can use jdb to attach to port 8000 to debug

    Code Block
        > jdb -attach 8000
    

    or if you are running Eclipse and the Hive projects are already imported, you can debug with Eclipse. Under Eclipse Run -> Debug Configurations, find "Remote Java Application" at the bottom of the left panel. There should be a MapRedTask configuration already. If there is no such configuration, you can create one with the following property:

    • Name: any task such as MapRedTask
    • Project: the Hive project that you imported.
    • Connection Type: Standard (Socket Attach)
    • Connection Properties:
      • Host: localhost
      • Port: 8000
        Then hit the "Debug" button and Eclipse will attach to the JVM listening on port 8000 and continue running till the end. If you define breakpoints in the source code before hitting the "Debug" button, it will stop there. The rest is the same as debugging client-side Hive.

Debugging without Ant (Client and Server Side)

There is another way of debugging Hive code without going through Ant.
You need to install Hadoop and set the environment variable HADOOP_HOME to that.

...

It will then act similar to the debugging steps outlines in Debugging Hive code. It is faster since there is no need to compile Hive code,
and go through Ant. It can be used to debug both client side and server side Hive.

If you want to debug a particular query, start Hive
and Hive and perform the stops steps needed before that query. Then start Hive again in debug to debug that query.

...

Note that the local file system will be used, so the space no on your machine will not be released automatically (unlike debugging via Ant, where the tables created in test are automatically dropped at the end of the test). Make sure to either drop the tables explicitly, or drop the data from /User/hive/warehouse.

...