Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

A prior version of the proposal can be found here.


Table of Contents

Introduction

Some functionalities are either not supported or too difficult to express using the DFDL expression language. While it is possible to add desired functionality to the codebase, some may be applicable to only a small dataset and won't have widespread use. It also leaves the burden of implementing said functionalities on the Daffodil developers. We would like Daffodil to be able to register and execute external/user defined functions (UDFs) in DFDL expressions.

See 

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyDAFFODIL-2186
 for additional information.

Use Cases/Examples

A trivial use case would be adding a 'replace' function that is callable from a DFDL expression. In the DFDL Schema, we might call  something like the below; where transformedElem will contain "Hello_World", if someElement resolves to "Hello World". 

...

Code Block
languagejava
@UserDefinedFunctionIdentification(
	name = "convert_to_hae", 
	namespace = "http://extOther.UDFunction.ElevationConversions.com" 
)
public class MSLConversions {
	public double evaluate(double latitude, double longitude, double msl) {
		//implementation..
	}
}

Requirements

  1. The UDF will be defined in a stand-alone library outside of the Daffodil codebase
  2. The UDF must be accessible to and callable from the Daffodil code
  3. Daffodil must be able to process and pass the return value from the UDF back to the Schema
  4. The support of UDFs in the DFDL Schema must be language agnostic and not Java, Scala or Daffodil specific

Proposed Solution

The Daffodil solution will use a combination of JAVA's ServiceLoader and Reflection APIs.

Daffodil Provided Jar

Daffodil will provide a UserDefinedFunction interface, a UserDefinedFunctionProvider abstract class, a  UserDefinedFunctionIdentification annotation class, and two exception classes: UserDefinedFunctionFatalException and UserDefinedFunctionProcessingError.

...

The UserDefinedFunctionProcessingError exception can be thrown when an implementer wishes to throw a recoverable that'll induce backtracking. The UserDefinedFunctionFatalException exception can be thrown to halt processing all together and abort Daffodil.

UDF Implementation

The implementer will be expected to implement at least 2 classes: a provider class and at least one UDF class.

...

The UDF classes will contain the functionality of the UDF embodied in an evaluate method. The UDF class will be expected to implement an evaluate method as well as apply the Daffodil provided UserDefinedFunctionIdentification annotation to the class . Because the parameter types and the return types of the evaluate function are dependent on the functionality, and we really only care about the name, we will not provide an abstract function for it. Each function that the implementer wishes to expose must implement the UserDefinedFunction interface, contain an evaluate function, and have the UserDefinedFunctionIdentification annotation. See 135859458 Proposal: Feature User Defined Functions for a sample UDF class.

Daffodil Service Loader

Daffodil will use the ServiceLoader API to poll for UDF Provider classes and return the desired function class on request.

Daffodil will have an internal object that uses the ServiceLoader iterator to aggregate and validate all the provider classes and the UDF classes they provide. This class will do this aggregation and validation at compile time, and will only initialize a UDF object and look up its method if an attempt is made to call the UDF. And providers or UDFs that don't validate during compile time will be dropped It. Any attempts to call a dropped UDF from the schema will result in an SDE.

Daffodil DFDL Processing

Acquiring the UDF

The Internal object referenced above will be instantiated only if a function call from the schema is not recognized as one of our previously supported functions. We will call this object's lookup function to find the UDF based on the name and namespace. If it finds the UDF, it will return a case class containing the UDF class, the evaluate method, its NodeInfo.Kind param types and return type. The aforementioned are necessary to call the UDF at runtime.  If the UDF is not found, we'll throw an SDE. 

Code Block
languagescala
val udfCallingInfo = UserDefinedFunctionService.lookupUserDefinedFunctionCallingInfo(namespace, fName)

val UserDefinedFunctionService.UserDefinedFunctionCallingInfo(udf, ei) = udfCallingInfo.get
val UserDefinedFunctionService.EvaluateMethodInfo(evaluateMethod, evaluateParamTypes, evaluateReturnType) = ei

Calling the

...

UDF

Within the DFDL expression processing code, Daffodil will define 2 case classes, a UserDefinedFunctionCallExpr and a UserDefinedFunctionCall. UserDefinedFunctionCallExpr will extend Daffodil's FunctionCallBase, and override inherentType, targetTypeForSubexpression and compiledDPath.  It will call UserDefinedFunctionCall as follows.

...

Code Block
languagescala
case class UserDefinedFunctionCall(
  functionQNameString: String,
  recipes: List[CompiledDPath],
  userDefinedFunction: UserDefinedFunction,
  evaluateFxn: UserDefinedFunctionMethod)
  extends FNArgsList(recipes) {

  override def computeValue(values: List[Any], dstate: DState) = {
    val jValues = values.map { _.asInstanceOf[Object] }
    try {
      val res = evaluateFxn.method.invoke(userDefinedFunction, jValues: _*)
      res
    } catch {
      case e: InvocationTargetException => {
        val targetException = e.getTargetException
        targetException match {
          case te: UserDefinedFunctionProcessingError =>
            throw new UserDefinedFunctionProcessingErrorException(
              s"User Defined Function '$functionQNameString'",
              Maybe(dstate.compileInfo.schemaFileLocation), dstate.contextLocation, Maybe(te), Maybe.Nope)
          case te: Exception =>
            throw new UserDefinedFunctionFatalErrorException(
              s"User Defined Function '$functionQNameString' Error",
              te, userDefinedFunction.getClass.getName)
        }
      }
      case e @ (_: IllegalArgumentException | _: NullPointerException | _: ReflectiveOperationException) =>
        throw new UserDefinedFunctionProcessingErrorException(
          s"User Defined Function '$functionQNameString'",
          Maybe(dstate.compileInfo.schemaFileLocation), dstate.contextLocation, Maybe(e), Maybe.Nope)
      case e: ExceptionInInitializerError =>
        throw new UserDefinedFunctionFatalErrorException(
          s"User Defined Function '$functionQNameString' Error",
          e, userDefinedFunction.getClass.getName)
    }
  }
}

Diagnostics

We intend to supply the user will at least the following errors/warning

  • Warning: Any ignored/dropped User Defined Function or User Defined Function Providers
  • Error: Errors loading User Defined Function Providers or initializing User Defined Functions
  • Info: User Defined Function Loaded
  • SDE: No User Defined function class with specified name/namespace found

Testing

FocusIDDescriptionTest Data
ServiceLoader API

1Tests when there are no providers found by the ServiceLoader API due to missing or empty meta-inf fileNo META-INF/Services/org.apache.daffodil.udf.UDFunctionProvider file on classpath of classLoader (CLI Test)
2Tests when there is an error thrown from ServiceLoader APIMETA-INF/Services/org.apache.daffodil.udf.UDFunctionProvider file contains typo in class name
Provider Class

3Tests when UDF Provider has no function classesUDF with no call to setFunctionClasses initializing functionClasses to array of classes
4Tests when UDF Provider has empty function classUDF with call to setFunctionClasses initializing functionClasses to empty array of classes
Function Class



6Tests when function classes don’t implement UserDefinedFunction interfaceUDF with function class that doesn’t implement UserDefinedFunction interface
7Tests when function classes don’t have annotationsUDF with function class that doesn’t have UserDefinedFunctionIdentification annotation
8Tests when function classes have empty/invalid annotation fieldsUDF with function class that has annotation function with empty fields
9Tests when function classes have no evaluate functionUDF with function class whose method isn’t named “evaluate”
10Tests when function can’t be foundFunction call from schema with no non either non existent namespace or name
Evaluate function




11Tests when function class have overloaded evaluate functionUDF with overloaded evaluate function
12Tests when arguments number incorrectFunction call from schema with incorrect arg number
13Tests when argument types incorrectFunction call from schema with incorrect arg type
14Tests when argument types unsupportedFunction call from schema with unsupported type (such as Calendar)
15Tests when return type unsupportedUDF with unsupported return type such as Array of Arrays
16Tests UDF with no argsUDF with no param and static return type

17Tests UDF with no return typeUDF with void return type
Primitive Arg/Return Types Testing





18Tests UDF with primitive int params and returnsUDF with primitive params and return
19Tests UDF with primitive byte params and returnsUDF with primitive params and return
20Tests UDF with primitive short params and returnsUDF with primitive params and return
21Tests UDF with primitive long params and returnsUDF with primitive params and return
22Tests UDF with primitive double params and returnsUDF with primitive params and return
23Tests UDF with primitive float params and returnsUDF with primitive params and return
24Tests UDF with primitive boolean params and returnsUDF with primitive params and return
Boxed Args/Return Type Testing





25Tests UDF with Boxed Integer params and returnsUDF with boxed params and return
26Tests UDF with Boxed Byte params and returnsUDF with boxed params and return
27Tests UDF with Boxed Short params and returnsUDF with boxed params and return
28Tests UDF with Boxed Long params and returnsUDF with boxed params and return
29Tests UDF with Boxed Double params and returnsUDF with boxed params and return
30Tests UDF with Boxed Float params and returnsUDF with boxed params and return
31Tests UDF with Boxed Boolean params and returnsUDF with boxed params and return
Other Param/Return Types



32Tests UDF with Java Big Integer params and returnsUDF with specified params and returns
33Tests UDF with Java Big Decimal params and returnsUDF with specified params and returns
34Tests UDF with String params and returnsUDF with specified params and returns
35Tests UDF with Byte Array params and returnsUDF with specified params and returns
36Tests UDF with URI params and returnsUDF with specified params and returns

Prototype

UDF Jars: HAEMSLConversions.jar and UDFunctionProviderImpl.jar. Both extend UDFunctionProvider.jar.

...