Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This will create a build subdirectory containing the compiled plugin: pdk-test-udf-0.1.jar. There's also a build/metadata directory containing add-jar.sql (demonstrating the command to use to load the plugin jar) and class-registration.sql (demonstrating the commands to use for loading the UDF's from the plugin). The .sql files can be passed via the Hive CLI's -i command-line parameter in order to be run as initialization scripts.

You can run the tests associated with the plugin via

...

  • your-plugin-root
    • build.xml
    • src : contains
      • Java source files
    • test : contains
      • setup.sql
      ,
      • cleanup.sql
      , and
      • any datafiles needed by your tests

For the example plugin, a datafile onerow.txt contains a single row of data; setup.sql creates a table named onerow and loads the datafile, whereas cleanup.sql drops the onerow table. The onerow table is convenient for testing UDF's.

Annotations

Now let's take a look at the source code for a UDF.

Code Block

package org.apache.hive.pdktest;

import org.apache.hive.pdk.HivePdkUnitTest;
import org.apache.hive.pdk.HivePdkUnitTests;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

/**
 * Example UDF for rot13 transformation.
 */
@Description(name = "rot13",
  value = "_FUNC_(str) - Returns str with all characters transposed via rot13",
  extended = "Example:\n"
  + "  > SELECT _FUNC_('Facebook') FROM src LIMIT 1;\n" + "  'Snprobbx'")
@HivePdkUnitTests(
    setup = "create table rot13_data(s string); "
    + "insert overwrite table rot13_data select 'Facebook' from onerow;",
    cleanup = "drop table if exists rot13_data;",
    cases = {
      @HivePdkUnitTest(
        query = "SELECT tp_rot13('Mixed Up!') FROM onerow;",
        result = "Zvkrq Hc!"),
      @HivePdkUnitTest(
        query = "SELECT tp_rot13(s) FROM rot13_data;",
        result = "Snprobbx")
    }
  )
public class Rot13 extends UDF {
  private Text t = new Text();

  public Rot13() {
  }

  public Text evaluate(Text s) {
    StringBuilder out = new StringBuilder(s.getLength());
    char[] ca = s.toString().toCharArray();
    for (char c : ca) {
      if (c >= 'a' && c <= 'm') {
        c += 13;
      } else if (c >= 'n' && c <= 'z') {
        c -= 13;
      } else if (c >= 'A' && c <= 'M') {
        c += 13;
      } else if (c >= 'N' && c <= 'Z') {
        c -= 13;
      }
      out.append(c);
    }
    t.set(out.toString());
    return t;
  }
}

The annotations are interpreted by the PDK as follows:

  • @Description: provides metadata to Hive about the function syntax and usage. Only classes with this annotation will be included in the generated class-registration.sql
  • @HivePdkUnitTests: enumerates one or more test cases, and also specifies optional setup and cleanup commands to run before and after the test cases.
  • @HivePdkUnitTest: specifies one test case, consisting of the query to run and the expected result

Annotations allow the code and tests to be kept close together. This is good for small tests; if your tests are very complicated, you may want to set up your own scripting around the Hive CLI.