Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Compile Hive code with javac.debug=on. Under Hive checkout directory.
    Code Block
        > ant -Djavac.debug=on package
    
    If you have already built Hive without javac.debug=on, you can clean the build and then run the above command.
    Code Block
        > ant clean  # not necessary if the first time to compile
        > ant -Djavac.debug=on package
    
  • Run ant test with additional options to tell the Java VM that is running Hive server-side code to wait for the debugger to attach. First define some convenient macros for debugging. You can put it in your .bashrc or .cshrc.
    Code Block
        > export HIVE_DEBUG_PORT=8000
        > export HIVE_DEBUG="-Xdebug -Xrunjdwp:transport=dt_socket,address=${HIVE_DEBUG_PORT},server=y,suspend=y"
    
    In particular HIVE_DEBUG_PORT is the port number that the JVM is listening on and the debugger will attach to. Then run the unit test as follows:
    Code Block
        > export HADOOP_OPTS=$HIVE_DEBUG
        > ant test -Dtestcase=TestCliDriver -Dqfile=<mytest>.q
    
    The unit test will run until it shows:
    Code Block
         [junit] Listening for transport dt_socket at address: 8000
    
  • Now, you can use jdb to attach to port 8000 to debug
    Code Block
        > jdb -attach 8000
    
    or if you are running Eclipse and the Hive projects are already imported, you can debug with Eclipse. Under Eclipse Run -> Debug Configurations, find "Remote Java Application" at the bottom of the left panel. There should be a MapRedTask configuration already. If there is no such configuration, you can create one with the following property:
  • Name: any task such as MapRedTask
  • Project: the Hive project that you imported.
  • Connection Type: Standard (Socket Attach)
  • Connection Properties:
    • Host: localhost
    • Port: 8000
      Then hit the "Debug" button and Eclipse will attach to the JVM listening on port 8000 and continue running till the end. If you define breakpoints in the source code before hitting the "Debug" button, it will stop there. The rest is the same as debugging client-side Hive.

There is another way of debugging hive code without going through ant.
You need to install hadoop and set the environment variable HADOOP_HOME to that.

Code Block

    > export HADOOP_HOME=<your hadoop home>
 

Then, start hive:

Code Block

    >  ./build/dist/bin/hive --debug
 

It will then act similar to the debugging steps outlines in Debugging Hive code. It is faster since there is no need to compile hive code,
and go through ant.

Pluggable interfaces

File Formats

Please refer to Hive User Group Meeting August 2009 Page 59-63.

SerDe - how to add a new SerDe

Please refer to Hive User Group Meeting August 2009 Page 64-70.

Map-Reduce Scripts

Please refer to Hive User Group Meeting August 2009 Page 71-73.

UDFs and UDAFs - how to add new UDFs and UDAFs

Please refer to Hive User Group Meeting August 2009 Page 74-87.