Impala has rich support for adding user-defined functions (UDFs) and user-defined aggregated functions (UDAFs). Many of Impala's builtin functions are implemented using the UDF and UDAF interfaces, but are hooked into Impala so that they are automatically loaded at Impala startup. These functions are registered in the "_impala_builtins" database. You can see a list of these functions by running the query "show functions in _impala_builtins;"

  1. Implement the UDF or UDAF functions in the same way as you would implement a non-builtin UDF/UDAF (see Impala docs and UDF development kit for more details). Builtins are all implemented in files under be/src/exprs/ in the Impala repository.
    • Files ending in ir.cc are cross compiled to LLVM IR so Impala's runtime code-generation can inline the functions into the compiled query. This cross-compilation is not done automatically, so make sure to add any new ir.cc files to the appropriate places where other ir.cc files are listed.
    • If you are adding a new .cc file (including ir.cc files), make sure to add a call to one of the functions in the new file to ScalarExprEvaluator::InitBuiltinsDummy() in be/src/exprs/scalar-expr-evaluator.cc. This ensures that your code will be linked into the final Impala binary
  2. If a UDF, add the new function to common/function-registry/impala_functions.py
    • The entries required for each function are: <function name>, <return type>, <list of argument types>, <function symbol>
       E.g. [['getbit'], 'TINYINT', ['BIGINT', 'INT'], '_ZN6impala16BitByteFunctions6GetBitIN10impala_udf9BigIntValEEENS2_10TinyIntValEPNS2_15FunctionContextERKT_RKNS2_6IntValE'],

    • For functions in ir.cc files, you can use the full function name as <function symbol>. E.g. impala::StringFunctions::Length.
    • After editing, run "make function-registry" to update the target Java codes (ScalarBuiltins.java). Then rebuild the be and fe.
  3. If a UDAF, you may need to add the function to fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
  4. Remember to write tests, e.g. in be/src/exprs/expr-test.cc and testdata/workloads/functional-query/queries/QueryTest/exprs.test (used by tests/query_test/test_exprs.py).

Tip: You can find the function symbol name after building your UDF by running "nm -g" on the compiled impalad binary or the .cc.o file. E.g.

$ nm -g be/build/latest/service/impalad | grep GetBit
0000000000e2f0e0 W _ZN6impala16BitByteFunctions6GetBitIN10impala_udf10TinyIntValEEES3_PNS2_15FunctionContextERKT_RKNS2_6IntValE
0000000000e2f210 W _ZN6impala16BitByteFunctions6GetBitIN10impala_udf11SmallIntValEEENS2_10TinyIntValEPNS2_15FunctionContextERKT_RKNS2_6IntValE
0000000000e2f340 W _ZN6impala16BitByteFunctions6GetBitIN10impala_udf6IntValEEENS2_10TinyIntValEPNS2_15FunctionContextERKT_RKS3_
0000000000e2f460 W _ZN6impala16BitByteFunctions6GetBitIN10impala_udf9BigIntValEEENS2_10TinyIntValEPNS2_15FunctionContextERKT_RKNS2_6IntValE

Or

$ find be -name *.cc.o | grep bit-byte-functions
be/src/exprs/CMakeFiles/Exprs.dir/bit-byte-functions-ir.cc.o

$ nm -g be/src/exprs/CMakeFiles/Exprs.dir/bit-byte-functions-ir.cc.o | grep GetBit
0000000000000000 W _ZN6impala16BitByteFunctions6GetBitIN10impala_udf10TinyIntValEEES3_PNS2_15FunctionContextERKT_RKNS2_6IntValE
0000000000000000 W _ZN6impala16BitByteFunctions6GetBitIN10impala_udf11SmallIntValEEENS2_10TinyIntValEPNS2_15FunctionContextERKT_RKNS2_6IntValE
0000000000000000 W _ZN6impala16BitByteFunctions6GetBitIN10impala_udf6IntValEEENS2_10TinyIntValEPNS2_15FunctionContextERKT_RKS3_
0000000000000000 W _ZN6impala16BitByteFunctions6GetBitIN10impala_udf9BigIntValEEENS2_10TinyIntValEPNS2_15FunctionContextERKT_RKNS2_6IntValE
  • No labels