Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated PR and fixed some typos

A prior version of the proposal can be found here.

Table of Contents

Introduction

Some functionalities are either not supported or too difficult to express using the DFDL expression language. While it is possible to add desired functionality to the codebase, some may be applicable to only a small dataset and won't have widespread use. It also leaves the burden of implementing said functionalities on the Daffodil developers. We would like Daffodil to be able to register and execute external/user defined functions (UDFs) in DFDL expressions.

See 

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyDAFFODIL-2186
 for additional information.

Use Cases/Examples

A trivial use case would be adding a 'replace' function that is callable from a DFDL expression. In the DFDL Schema, we might call  call something like the below; where transformedElem will contain "Hello_World", if someElement resolves to "Hello World". 

...

Code Block
languagejava
@UserDefinedFunctionIdentification(
	name = "replace", 
	namespacenamespaceURI = "urn:example:com:ext:udfunction:stringfunctions" 
)
public class Replace {
	public String evaluate(String orig, String pre, String post) {
		//implementation...
	}
}

...

Another use case would be implementing the normalization of elevation above Mean-Sea-Level (MSL) to Height-Above-Ellipsoid (HAE) for Link16F1 data. In the DFDL schema, we might call something like the below; where the functions will return the result of the conversion.

Code Block
xmlns:mhdf="http://extOther.UDFunction.ElevationConversions.com"
...
dfdl:outputValueCalc="{ mhdf:convert_to_hae(../lat, ../lon, ../msl) }"

The function userDefinedFunction class would look something like the below

Code Block
languagejava
@UserDefinedFunctionIdentification(
	name = "convert_to_hae", 
	namespacenamespaceURI = "http://extOther.UDFunction.ElevationConversions.com" 
)
public class MSLConversions {
	public double evaluate(double latitude, double longitude, double msl) {
		//implementation..
	}
}

Requirements

  1. The UDF will be defined in a stand-alone library outside of the Daffodil codebase
  2. The UDF must be accessible to and callable from the Daffodil code
  3. Daffodil must be able to process and pass the return value from the UDF back to the Schema
  4. The support of UDFs in the DFDL Schema must be language agnostic and not Java, Scala or Daffodil specific

Proposed Solution

The Daffodil solution will use a combination of JAVA's ServiceLoader and Reflection APIs.

Daffodil Provided

...

Classes

Daffodil will provide a UserDefinedFunction interface, a UserDefinedFunctionProvider abstract class, a  UserDefinedFunctionIdentification annotation class, and two exception classes: UserDefinedFunctionFatalException and UserDefinedFunctionProcessingError.

Each UDF must implement the UserDefinedFunction interface. This marks it as a UDf UDF to Daffodil and gives it some properties such as Serializability.

...

The UserDefinedFunctionProcessingError exception can be thrown when an implementer wishes to throw a recoverable error that'll induce backtracking. The UserDefinedFunctionFatalException exception can be thrown to halt processing all together and abort Daffodil.

UDF Implementation

The implementer will be expected to implement at least 2 two classes: a provider class and at least one UDF class.

...

The UDF classes will contain the functionality of the UDF embodied in an evaluate method. The UDF class will be expected to implement an evaluate method as well as apply the Daffodil provided UserDefinedFunctionIdentification annotation to the class . Because the parameter types and the return types of the evaluate function are dependent on the functionality, and we really only care about the name, we will not provide an abstract function for it. Each function that the implementer wishes to expose must implement the UserDefinedFunction interface, contain an evaluate function, and have the UserDefinedFunctionIdentification annotation. See Proposal: Feature User Defined Functions 135859640 for a sample UDF class.

Daffodil Service Loader

Daffodil will use the ServiceLoader API to poll for UDF Provider classes and return the desired function class on request.

Daffodil will have an internal object that uses the ServiceLoader iterator to aggregate and validate all the provider classes and the UDF classes they provide. This class will do this aggregation and validation at compile time, and will only initialize a UDF object and look up its method if an attempt is made to call the UDF. And providers or UDFs that don't validate during compile time will be dropped It. Any attempts to call a dropped UDF from the schema will result in an SDE.

Daffodil DFDL Processing

Acquiring the UDF

The Internal object referenced above will be instantiated only if a function call from the schema is not recognized as one of our previously supported functions. We will call this object's lookup function to find the UDF based on the name and namespace. If it finds the UDF, it will return a case class containing the UDF class, the evaluate method, its NodeInfo.Kind param types and return type. The aforementioned are necessary to call the UDF at runtime.  If the UDF is not found, we'll throw an SDE. 

Code Block
languagescala
val udfCallingInfo = UserDefinedFunctionService.lookupUserDefinedFunctionCallingInfo(namespace, fName)

val UserDefinedFunctionService.UserDefinedFunctionCallingInfo(udf, ei) = udfCallingInfo.get
val UserDefinedFunctionService.EvaluateMethodInfo(evaluateMethod, evaluateParamTypes, evaluateReturnType) = ei

Calling the UDF

Within the DFDL expression processing code, Daffodil will define 2 case classes, a UserDefinedFunctionCallExpr and a UserDefinedFunctionCall. UserDefinedFunctionCallExpr will extend Daffodil's FunctionCallBase, and override inherentType, targetTypeForSubexpression and compiledDPath.  It will call UserDefinedFunctionCall as follows.

...

Code Block
languagescala
case class UserDefinedFunctionCall(
  functionQNameString: String,
  recipes: List[CompiledDPath],
  userDefinedFunction: UserDefinedFunction,
  evaluateFxn: UserDefinedFunctionMethod)
  extends FNArgsList(recipes) {

  override def computeValue(values: List[Any], dstate: DState) = {
    val jValues = values.map { _.asInstanceOf[Object] }
    try {
      val res = evaluateFxn.method.invoke(userDefinedFunction, jValues: _*)
      res
    } catch {
      case e: InvocationTargetException => {
        val targetException = e.getTargetException
        targetException match {
          case te: UserDefinedFunctionProcessingError =>
            throw new UserDefinedFunctionProcessingErrorException(
              s"User Defined Function '$functionQNameString'",
              Maybe(dstate.compileInfo.schemaFileLocation), dstate.contextLocation, Maybe(te), Maybe.Nope)
          case te: Exception =>
            throw new UserDefinedFunctionFatalErrorException(
              s"User Defined Function '$functionQNameString' Error",
              te, userDefinedFunction.getClass.getName)
        }
      }
      case e @ (_: IllegalArgumentException | _: NullPointerException | _: ReflectiveOperationException) =>
        throw new UserDefinedFunctionProcessingErrorException(
          s"User Defined Function '$functionQNameString'",
          Maybe(dstate.compileInfo.schemaFileLocation), dstate.contextLocation, Maybe(e), Maybe.Nope)
      case e: ExceptionInInitializerError =>
        throw new UserDefinedFunctionFatalErrorException(
          s"User Defined Function '$functionQNameString' Error",
          e, userDefinedFunction.getClass.getName)
    }
  }
}

Diagnostics

We intend to supply the user will at least the following errors/warning

  • Warning: Any ignored/dropped User Defined Function or User Defined Function Providers
  • Error: Errors loading User Defined Function Providers or initializing User Defined Functions
  • Info: User Defined Function Loaded
  • SDE: No User Defined function class with specified name/namespace found

Testing

...

ServiceLoader API
Provider ClassFunction ClassEvaluate function
Primitive Arg/Return Types Testing
Boxed Args/Return Type TestingOther Param/Return Types
IDDescriptionTest Data
1Tests when there are no providers found by the ServiceLoader API due to missing or empty meta-inf fileNo META-INF /Services/org.apache.daffodil.udf.UDFunctionProvider file on classpath of classLoader (CLI Test)
2Tests when there is an error thrown from ServiceLoader APIMETA-INF /Services/org.apache.daffodil.udf.UDFunctionProvider file contains typo in class nameclass that doesn’t exit
3Tests when UDF Provider has no function classesUDF with no call to setFunctionClasses initializing functionClasses to array of classesUDFP whose getUDF func returns null
4Tests when UDF Provider has empty function classUDF with call to setFunctionClasses initializing functionClasses to UDFP whose getUDF func returns empty array of classes
56Tests when function classes don’t implement UserDefinedFunction interfaceUDF with function class that doesn’t implement UserDefinedFunction interface
76Tests when function classes don’t have annotationsUDF with function class that doesn’t have UserDefinedFunctionIdentification annotation
87Tests when function classes have empty/invalid annotation fieldsUDF with function class that has annotation function with empty fields
98Tests when function classes have no evaluate functionUDF with function class whose method isn’t named “evaluate”doesn’t have method called evaluate
910Tests when function can’t be foundFunction call from schema with no non either non existent namespace or namematching UDF loaded
1011Tests when function class have overloaded evaluate functionUDF with overloaded evaluate function
1211Tests when arguments number incorrectFunction call from schema with incorrect arg number
1312Tests when argument types incorrectFunction call from schema with incorrect arg type
1413Tests when argument types unsupportedFunction call from schema with unsupported type (such as CalendarArray of String)
1514Tests when return type unsupportedUDF with unsupported return type such as Array of Arrays
1615Tests UDF with no argsUDF with no param and static return typeparams
1617Tests UDF with no return typeUDF with void return type
1718Tests UDF with primitive int params and returnsUDF with primitive params and return
1918Tests UDF with primitive byte params and returnsUDF with primitive params and return
2019Tests UDF with primitive short byte array params and returnsUDF with primitive params and return
2120Tests UDF with primitive long short params and returnsUDF with primitive params and return
2221Tests UDF with primitive double long params and returnsUDF with primitive params and return
22Tests UDF with primitive double params and returnsUDF with primitive params and return
23Tests UDF with primitive float params and returnsUDF with primitive params and return
24Tests UDF with primitive boolean params and returnsUDF with primitive params and return
25Tests UDF with Boxed Integer params and returnsUDF with boxed params and return
26Tests UDF with Boxed Byte params and returnsUDF with boxed params and return
27Tests UDF with Boxed Short params and returnsUDF with boxed params and return
28Tests UDF with Boxed Long params and returnsUDF with boxed params and return
29Tests UDF with Boxed Double params and returnsUDF with boxed params and return
30Tests UDF with Boxed Float params and returnsUDF with boxed params and return
31Tests UDF with Boxed Boolean params and returnsUDF with boxed params and return
32Tests UDF with Java Big Integer params and returnsUDF with specified params and returns
33Tests UDF with Java Big Decimal params and returnsUDF with specified params and returns
34Tests UDF with String params and returnsUDF with specified params and returns
35Tests UDF with Byte Array params and returnsUDF with specified params and returnswhen no UDFs called, and no UDFs available to be loadedNo UDFs on classpath, no UDF in schema
36Tests when UDFs called, but no UDFs loadedNo UDFs on classpath, UDF with URI params and returnsUDF with specified params and returns

Prototype

UDF Jars: HAEMSLConversions.jar and UDFunctionProviderImpl.jar. Both extend UDFunctionProvider.jar.

MockDaffodil.jar contains a Scala app, that also contains a JAVA class that uses ServiceLoader. It needs UDFunctionProviderImpl.jar & UDFunctionProvider.jar

View file
nameHAEMSLConversions.jar
height250
View file
nameMockDaffodil.jar
height250
View file
nameUDFunctionProvider.jar
height250
in schema
37Tests when UDF called with default namespaceDefault namespace set to UDF namespaceURI; UDF calls with no prefix
38Tests when exceptions thrown during loading UDFPUDFP classes throws exception in class
39Tests when exceptions thrown during loading UDFP’s UDF classesUDFP throws exception in getUDFs function
40Tests when exceptions thrown during loading UDFUDF throws exception in class
41Tests when custom exceptions thrown during evaluating (FatalError)UDF throws exception in evaluate function
42Tests when UDFProcessingError thrown during evaluating (ProcessingError)UDF throws UDFProcessingError in evaluate function
43Tests when UDF initializer returns object of wrong typeUDFP’s initialization function creates UDF object of different type

Pull Requests

https://github.com/apache/incubator-daffodil/pull/273 - Initial Proposal

https://github.com/apache/incubator-daffodil/pull/279 - Final Product

...