Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: link to DDL Create Function section; minor edits

...

First, you need to create a new class that extends UDF, with one or more methods named evaluate.

Code Block

package com.example.hive.udf;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public final class Lower extends UDF {
  public Text evaluate(final Text s) {
    if (s == null) { return null; }
    return new Text(s.toString().toLowerCase());
  }
}

...

After compiling your code to a jar, you need to add this to the hive Hive classpath. See the section below on deploying jars.

Once hive Hive is started up with your jars in the classpath, the final step is to register your function as described in Create Function:

Code Block

create temporary function my_lower as 'com.example.hive.udf.Lower';

Now you can start using it:

Code Block

hive> select my_lower(title), sum(freq) from titles group by my_lower(title);

...

Ended Job = job_200906231019_0006
OK
cmo	13.0
vp	7.0

For a more involved example, see this page.

Deploying

...

Jars for User Defined Functions and User Defined SerDes

In order to start using your UDF, you first need to add the code to the classpath:

Code Block

hive> add jar my_jar.jar;
Added my_jar.jar to class path

By default, it will look in the current directory. You can also specify a full path:

Code Block

hive> add jar /tmp/my_jar.jar;
Added /tmp/my_jar.jar to class path

Your jar will then be on the classpath for all jobs initiated from that session. To see which jars have been added to the classpath you can use:

Code Block

hive> list jars;
my_jar.jar