Hive Developer FAQ
Table of Contents |
---|
Info | ||
---|---|---|
| ||
Hive is using Maven as its build tool. Versions prior to 0.13 were using Ant. |
Developing
How do I add a new MiniDriver test?
See MiniDriver Tests for information about MiniDriver and Beeline tests.
How do I move some files?
Post a patch for testing purposes which simply does add and deletes. SVN will not understand these patches are actually moves, therefore you should actually upload the following, in order so the last upload is the patch for testing purposes:
- A patch which has only the non-move changes for commit e.g. HIVE-XXX-for-commit.patch
- A script of of commands required to make the moves HIVE-XXX-moves.sh
- A patch for testing purposes HIVE-XXX.patch
The script should be a set of svn mv
commands along with any perl
commands required for find/replace. For example:
No Format |
---|
$ svn mv MyCLass.java MyClass.java $ perl -i -pe 's<at:var at:name="MyCLass" />MyClass@g' MyClass.java |
Building
- See Getting Started: Building Hive from Source for detailed information about building Hive releases 0.13 and later with Maven.
- See Installing from Source Code (Hive 0.12.0 and Earlier) for detailed information about building Hive 0.12 and earlier with Ant.
Maven settings
You might have to set the following Maven options on certain systems to get build working: Set MAVEN_OPTS to "-Xmx2g -XX:MaxPermSize=256M".
How to build all source?
MVN:
Code Block |
---|
mvn clean install -DskipTests -Phadoop-1 cd itests mvn clean install -DskipTests -Phadoop-1 |
How to specify the Hadoop version?
In mvn
commands, use -Phadoop-1
or -Phadoop-2
to specify the Hadoop version. Several examples are shown in these build instructions.
How do I import into Eclipse?
Build and generate Eclipse files (the conservative method):
Code Block |
---|
$ mkdir workspace $ cd workspace $ git clone https://github.com/apache/hive.git $ cd hive $ mvn clean install -DskipTests -Phadoop-2 $ mvn eclipse:clean $ mvn eclipse:eclipse -DdownloadSources -DdownloadJavadocs -Phadoop-2 $ cd itests $ mvn clean install -DskipTests -Phadoop-2 $ mvn eclipse:clean $ mvn eclipse:eclipse -DdownloadSources -DdownloadJavadocs -Phadoop-2 |
In Eclipse define M2_REPO in Preferences -> Java -> Build Path -> Classpath Variables to either:
Mac Example
Code Block |
---|
/Users/$USER/.m2/repository |
Linux Example
Code Block |
---|
/home/$USER/.m2/repository |
Windows Example
Code Block |
---|
C:/users/$USER/.m2/repository |
Then import the workspaces. If you get an error about "restricted use of Signal" for Beeline and CLI, follow these instructions.
Note that if you use the Hive git base directory as the Eclipse workspace, then it does not pick the right project names (for example, picks 'ant' instead of 'hive-ant'). Therefore it's recommended to have the workspace directory one up from the git directory. For example workspaces/hive-workspace/hive where hive-workspace is the Eclipse workspace and hive is the git base directory.
How to generate tarball?
MVN:
Code Block |
---|
mvn clean package -DskipTests -Phadoop-1 -Pdist |
It will then be located in the packaging/target/ directory.
How to generate protobuf code?
MVN:
Code Block |
---|
cd ql mvn clean install -DskipTests -Phadoop-1,protobuf |
How to generate Thrift code?
MVN:
Code Block |
---|
mvn clean install -Phadoop-1,thriftif -DskipTests -Dthrift.home=/usr/local |
How to compile ODBC?
MVN:
Code Block |
---|
cd odbc mvn compile -Phadoop-1,odbc -Dthrift.home=/usr/local -Dboost.home=/usr/local |
How do I publish Hive artifacts to my local Maven repository?
Code Block |
---|
ant package ant -Dmvn.publish.repo=local maven-build ant -Dmvn.publish.repo=local maven-publish |
MVN:
Code Block |
---|
mvn clean install -DskipTests -Phadoop-1 cd itests mvn clean install -DskipTests -Phadoop-1 |
Testing
For general information, see Unit Tests and Debugging in the Developer Guide.
Where is the log output of a test?
Logs are put in a couple locations:
From the root of the source tree: find . -name hive.log
/tmp/$USER/ (Linux) or $TMPDIR/$USER/ (MacOS)
How do I run a single test?
Warning | ||
---|---|---|
| ||
Note that any test in the itests directory needs to be executed from with the itests directory. The pom is disconnected from the parent project for technical reasons. |
Single test class:
No Format |
---|
mvn test -Dtest=ClassName -Phadoop-1 |
Single test method:
No Format |
---|
mvn test -Dtest=ClassName#methodName -Phadoop-1 |
Note that a pattern can also be supplied to -Dtests to run multiple tests matching the pattern:
Code Block |
---|
mvn test -Dtest='org.apache.hive.beeline.*' -Phadoop-1 |
For more usage see the documentation for the Maven Surefire Plugin.
Why isn't the itests pom connected to the root pom?
The qfile tests in itests require the packaging phase. The Maven test phase is after compile and before packaging. We could change the qfile tests to run during the integration-test phase using the "failsafe" plugin but the "failsafe" plugin is different than surefire and IMO is hard to use. If you'd like to give that a try, by all means, go ahead.
How do I debug into a single test in Eclipse?
You can debug into a single JUnit test in Eclipse by first making sure you've built the Eclipse files and imported the project into Eclipse as described here. Then set one or more breakpoints, highlight the method name of the JUnit test method you want to debug into, and do Run->Debug
.
For more information about debugging, see Debugging Hive Code in the Developer Guide.
A test fails with a NullPointerException in MiniDFSCluster
If any test fails with the error below it means you have an inappropriate umask setting. It should be set to 0022.
No Format |
---|
java.lang.NullPointerException: null at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:426) at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:284) at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:124) |
How do I run all of the unit tests?
Code Block | ||
---|---|---|
| ||
mvn test -Phadoop-2 cd itests mvn test -Phadoop-2 |
Note that you need to have previously built and installed the jars:
Code Block | ||
---|---|---|
| ||
mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 |
Info | ||||
---|---|---|---|---|
| ||||
Make sure that your JAVA_HOME is appropriately set (some tests need this), and set ANT_OPTS to increase the size allocated to the Permanent Generation as per the following:
Then, for a clean build, run
Note that running |
How do I run all of the unit tests except for a certain few tests?
Similar to running all tests, but define test.excludes.additional to specify a test/pattern to exclude from the test run. For example the following will run all tests except for the CliDriver tests:
Code Block |
---|
cd itests mvn test -Dtest.excludes.additional='**/Test*CliDriver.java' -Phadoop-1 |
How do I update the output of a CliDriver testcase?
Code Block |
---|
ant test -Dtestcase=TestCliDriver -Dqfile=alter1.q -Doverwrite=true |
MVN:
Code Block |
---|
cd itests/qtest mvn test -Dtest=TestCliDriver -Dqfile=alter1.q -Dtest.output.overwrite=true -Phadoop-1 |
As of Hive 0.11.0+ you can cut this time in half by specifying that only the ql module needs to rebuild
Code Block |
---|
ant test -Dmodule=ql -Dtestcase=TestCliDriver -Dqfile=alter1.q -Doverwrite=true |
How do I update the results of many test cases?
Assume that you have a file like below which you'd like to re-generate output files for. Such a file could be created by copying the output from the precommit tests.
Code Block |
---|
head -2 /tmp/failed-TestCliDriver-file-tests org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join |
You can re-generate all those output files in batches of 20 with the command below
Code Block |
---|
egrep 'TestCliDriver' /tmp/failed-TestCliDriver-file-tests | perl -pe 's@.*testCliDriver_@@g' | awk '{print $1 ".q"}' | xargs -n 30 | perl -pe 's@ @,@g' | xargs -I{} mvn test -Dtest=TestCliDriver -Phadoop-2 -Dtest.output.overwrite=true -Dqfile={} |
How do I run the clientpositive/clientnegative unit tests?
All of the below require that you have previously run ant package
.
To run clientpositive tests
Code Block |
---|
ant -Dtestcase=TestCliDriver test |
MVN:
Code Block |
---|
cd itests/qtest mvn test -Dtest=TestCliDriver -Phadoop-1 |
To run a single clientnegative test alter1.q
Code Block |
---|
ant -Dtestcase=TestNegativeCliDriver -Dqfile=alter1.q test |
MVN:
Code Block |
---|
cd itests/qtest mvn test -Dtest=TestNegativeCliDriver -Dqfile=alter1.q -Phadoop-1 |
To run all of the clientpositive tests that match a regex, for example the partition_wise_fileformat tests
Code Block |
---|
ant -Dtestcase=TestCliDriver -Dqfile_regex=partition_wise_fileformat.* test |
MVN:
Code Block |
---|
cd itests/qtest mvn test -Dtest=TestCliDriver -Dqfile_regex=partition_wise_fileformat.* -Phadoop-1 |
To run a single contrib test alter1.q and overwrite the result file
Code Block |
---|
ant -Dtestcase=TestContribCliDriver -Dqfile=alter1.q -Doverwrite=true test |
MVN:
Code Block |
---|
cd itests/qtest mvn test -Dtest=TestContribCliDriver -Dqfile=alter1.q -Dtest.output.overwrite=true -Phadoop-1 |
To run a single test groupby1.q and output detailed information during execution
Code Block |
---|
ant -Dtestcase=TestCliDriver -Dqfile=groupby1.q -Dtest.silent=false test |
As of Hive 0.11.0+ you can cut down the total build time by specifying that only the ql module needs to rebuild. For example, run all the partition_wise_fileformat tests
Code Block |
---|
ant -Dmodule=ql -Dtestcase=TestCliDriver -Dqfile_regex=partition_wise_fileformat.* test |
How do I rerun precommit tests over the same patch?
Upload the exact same patch again to the JIRA.