Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Hive User-Defined Functions (UDFs) are custom functions developed in Java and seamlessly integrated with Apache Hive. UDFs are routines designed to accept parameters, execute a specific action, and return the resulting value. The return value can either be a single scalar row or a complete result set, depending on the UDF's code and the implemented interface. UDFs represent a powerful capability that enhances classical SQL functionality by allowing the integration of custom code, providing Hive users with a versatile toolset. Apache Hive comes equipped with a variety of built-in UDFs that users can leverage. Similar to other SQL-based solutions, Hive also offers functionality to expand its already rich set of UDFs by incorporating custom ones as needed.

...

ESRI UDF's is an extensive list. In Hive documentation we do not replicate all the function documentation. On this link, you can find ESRI UDFs documentation.

Real-world example: Blog

Creating Custom UDF's

Apache Hive has rich built-in UDFs, but still, if you need something special that the built-in functions did not cover you can write your own. Just a small Java knowledge is needed to write a custom UDF.

UDFs can be divided into three types depending on the number of input rows and the number of output rows returned. Each of these UDFs needs to derive (implement) a different interface.

UDFUDAFUDTF
It is a function that receives only a single row as an input and returns a single row as an output.
Like: length, or round functions
org.apache.hadoop.hive.ql.exec.UDF;
It is a function that receives multiple rows as input and returns a single row as output.
Like: Count, Min, Max

It is a function that receives a single row as input and returns multiple rows - result set or table - as output.
Like: exploed, parse_url_tuple