A prior version of the proposal can be found here.
Table of Contents |
---|
Introduction
Some functionalities are either not supported or too difficult to express using the DFDL expression language. While it is possible to add desired functionality to the codebase, some may be applicable to only a small dataset and won't have widespread use. It also leaves the burden of implementing said functionalities on the Daffodil developers. We would like Daffodil to be able to register and execute external/user defined functions (UDFs) in DFDL expressions.
See
Jira | ||||||
---|---|---|---|---|---|---|
|
Use Cases/Examples
A trivial use case would be adding a 'replace' function that is callable from a DFDL expression. In the DFDL Schema, we might call call something like the below; where transformedElem will contain "Hello_World", if someElement resolves to "Hello World".
...
Code Block | ||
---|---|---|
| ||
@UserDefinedFunctionIdentification( name = "replace", namespacenamespaceURI = "urn:example:com:ext:udfunction:stringfunctions" ) public class Replace { public String evaluate(String orig, String pre, String post) { //implementation... } } |
...
Another use case would be implementing the normalization of elevation above Mean-Sea-Level (MSL) to Height-Above-Ellipsoid (HAE) for Link16F1 data. In the DFDL schema, we might call something like the below; where the functions will return the result of the conversion.
Code Block |
---|
xmlns:mhdf="http://extOther.UDFunction.ElevationConversions.com" ... dfdl:outputValueCalc="{ mhdf:convert_to_hae(../lat, ../lon, ../msl) }" |
The function userDefinedFunction class would look something like the below
Code Block | ||
---|---|---|
| ||
@UserDefinedFunctionIdentification( name = "convert_to_hae", namespacenamespaceURI = "http://extOther.UDFunction.ElevationConversions.com" ) public class MSLConversions { public double evaluate(double latitude, double longitude, double msl) { //implementation.. } } |
Requirements
- The UDF will be defined in a stand-alone library outside of the Daffodil codebase
- The UDF must be accessible to and callable from the Daffodil code
- Daffodil must be able to process and pass the return value from the UDF back to the Schema
- The support of UDFs in the DFDL Schema must be language agnostic and not Java, Scala or Daffodil specific
Proposed Solution
The Daffodil solution will use a combination of JAVA's ServiceLoader and Reflection APIs.
Daffodil Provided
...
Classes
Daffodil will provide a UserDefinedFunction interface, a UserDefinedFunctionProvider abstract class, a UserDefinedFunctionIdentification annotation class, and two exception classes: UserDefinedFunctionFatalException and UserDefinedFunctionProcessingError.
Each UDF must implement the UserDefinedFunction interface. This marks it as a UDf UDF to Daffodil and gives it some properties such as Serializability.
...
The UserDefinedFunctionProcessingError exception can be thrown when an implementer wishes to throw a recoverable error that'll induce backtracking. The UserDefinedFunctionFatalException exception can be thrown to halt processing all together and abort Daffodil.
UDF Implementation
The implementer will be expected to implement at least 2 two classes: a provider class and at least one UDF class.
...
The UDF classes will contain the functionality of the UDF embodied in an evaluate method. The UDF class will be expected to implement an evaluate method as well as apply the Daffodil provided UserDefinedFunctionIdentification annotation to the class . Because the parameter types and the return types of the evaluate function are dependent on the functionality, and we really only care about the name, we will not provide an abstract function for it. Each function that the implementer wishes to expose must implement the UserDefinedFunction interface, contain an evaluate function, and have the UserDefinedFunctionIdentification annotation. See 135859458 135859640 for a sample UDF class.
Daffodil Service Loader
Daffodil will use the ServiceLoader API to poll for UDF Provider classes and return the desired function class on request.
Daffodil will have an internal object that uses the ServiceLoader iterator to aggregate and validate all the provider classes and the UDF classes they provide. This class will do this aggregation and validation at compile time, and will only initialize a UDF object and look up its method if an attempt is made to call the UDF. And providers or UDFs that don't validate during compile time will be dropped It. Any attempts to call a dropped UDF from the schema will result in an SDE.
Daffodil DFDL Processing
Acquiring the UDF
The Internal object referenced above will be instantiated only if a function call from the schema is not recognized as one of our previously supported functions. We will call this object's lookup function to find the UDF based on the name and namespace. If it finds the UDF, it will return a case class containing the UDF class, the evaluate method, its NodeInfo.Kind param types and return type. The aforementioned are necessary to call the UDF at runtime. If the UDF is not found, we'll throw an SDE.
Code Block | ||
---|---|---|
| ||
val udfCallingInfo = UserDefinedFunctionService.lookupUserDefinedFunctionCallingInfo(namespace, fName) val UserDefinedFunctionService.UserDefinedFunctionCallingInfo(udf, ei) = udfCallingInfo.get val UserDefinedFunctionService.EvaluateMethodInfo(evaluateMethod, evaluateParamTypes, evaluateReturnType) = ei |
Calling the
...
UDF
Within the DFDL expression processing code, Daffodil will define 2 case classes, a UserDefinedFunctionCallExpr and a UserDefinedFunctionCall. UserDefinedFunctionCallExpr will extend Daffodil's FunctionCallBase, and override inherentType, targetTypeForSubexpression and compiledDPath. It will call UserDefinedFunctionCall as follows.
...
Code Block | ||
---|---|---|
| ||
case class UserDefinedFunctionCall( functionQNameString: String, recipes: List[CompiledDPath], userDefinedFunction: UserDefinedFunction, evaluateFxn: UserDefinedFunctionMethod) extends FNArgsList(recipes) { override def computeValue(values: List[Any], dstate: DState) = { val jValues = values.map { _.asInstanceOf[Object] } try { val res = evaluateFxn.method.invoke(userDefinedFunction, jValues: _*) res } catch { case e: InvocationTargetException => { val targetException = e.getTargetException targetException match { case te: UserDefinedFunctionProcessingError => throw new UserDefinedFunctionProcessingErrorException( s"User Defined Function '$functionQNameString'", Maybe(dstate.compileInfo.schemaFileLocation), dstate.contextLocation, Maybe(te), Maybe.Nope) case te: Exception => throw new UserDefinedFunctionFatalErrorException( s"User Defined Function '$functionQNameString' Error", te, userDefinedFunction.getClass.getName) } } case e @ (_: IllegalArgumentException | _: NullPointerException | _: ReflectiveOperationException) => throw new UserDefinedFunctionProcessingErrorException( s"User Defined Function '$functionQNameString'", Maybe(dstate.compileInfo.schemaFileLocation), dstate.contextLocation, Maybe(e), Maybe.Nope) case e: ExceptionInInitializerError => throw new UserDefinedFunctionFatalErrorException( s"User Defined Function '$functionQNameString' Error", e, userDefinedFunction.getClass.getName) } } } |
Diagnostics
We intend to supply the user will at least the following errors/warning
- Warning: Any ignored/dropped User Defined Function or User Defined Function Providers
- Error: Errors loading User Defined Function Providers or initializing User Defined Functions
- Info: User Defined Function Loaded
- SDE: No User Defined function class with specified name/namespace found
Testing
Focus | ID | Description | Test Data | ServiceLoader API|
---|---|---|---|---|
1 | Tests when there are no providers found by the ServiceLoader API due to missing or empty meta-inf file | No META-INF /Services/org.apache.daffodil.udf.UDFunctionProvider file on classpath of classLoader (CLI Test) | ||
2 | Tests when there is an error thrown from ServiceLoader API | META-INF /Services/org.apache.daffodil.udf.UDFunctionProvider file contains typo in class name | Provider Classclass that doesn’t exit | |
3 | Tests when UDF Provider has no function classesUDF with no call to setFunctionClasses initializing functionClasses to array of classes | UDFP whose getUDF func returns null | ||
4 | Tests when UDF Provider has empty function class | UDF with call to setFunctionClasses initializing functionClasses to UDFP whose getUDF func returns empty array of classes | ||
65 | Tests when function classes don’t implement UserDefinedFunction interface | UDF with function class that doesn’t implement UserDefinedFunction interface | ||
76 | Tests when function classes don’t have annotations | UDF with function class that doesn’t have UserDefinedFunctionIdentification annotation | ||
87 | Tests when function classes have empty/invalid annotation fields | UDF with function class that has annotation function with empty fields | ||
98 | Tests when function classes have no evaluate function | UDF with function class whose method isn’t named “evaluate”doesn’t have method called evaluate | ||
910 | Tests when function can’t be found | Function call from schema with no non either non existent namespace or namematching UDF loaded | ||
10 | Evaluate function11 | Tests when function class have overloaded evaluate function | UDF with overloaded evaluate function | |
1211 | Tests when arguments number incorrect | Function call from schema with incorrect arg number | ||
1312 | Tests when argument types incorrect | Function call from schema with incorrect arg type | ||
1413 | Tests when argument types unsupported | Function call from schema with unsupported type (such as CalendarArray of String) | ||
1514 | Tests when return type unsupported | UDF with unsupported return type such as Array of Arrays | ||
1615 | Tests UDF with no args | UDF with no param and static return typeparams | ||
1617 | Tests UDF with no return type | UDF with void return type | Primitive Arg/Return Types Testing||
17 | Tests UDF with primitive int params and returns | UDF with primitive params and return | ||
18 | Tests UDF with primitive int byte params and returns | UDF with primitive params and return | ||
19 | Tests UDF with primitive byte array params and returns | UDF with primitive params and return | ||
20 | Tests UDF with primitive short params and returns | UDF with primitive params and return | ||
21 | Tests UDF with primitive long params and returns | UDF with primitive params and return | ||
22 | Tests UDF with primitive double params and returns | UDF with primitive params and return | ||
23 | Tests UDF with primitive float params and returns | UDF with primitive params and return | ||
24 | Tests UDF with primitive boolean params and returns | UDF with primitive params and return | ||
2525 | Tests UDF with Boxed Integer params and returns | UDF with boxed params and return | ||
26 | Tests UDF with Boxed Byte params and returns | UDF with boxed params and return | ||
27 | Tests UDF with Boxed Short params and returns | UDF with boxed params and return | ||
28 | Tests UDF with Boxed Long params and returns | UDF with boxed params and return | ||
29 | Tests UDF with Boxed Double params and returns | UDF with boxed params and return | ||
30 | Tests UDF with Boxed Float params and returns | UDF with boxed params and return | ||
31 | Tests UDF with Boxed Boolean params and returns | UDF with boxed params and return | Other Param/Return Types||
32 | Tests UDF with Java Big Integer params and returns | UDF with specified params and returns | ||
33 | Tests UDF with Java Big Decimal params and returns | UDF with specified params and returns | ||
34 | Tests UDF with String params and returns | UDF with specified params and returns | ||
35 | Tests UDF with Byte Array params and returnsUDF with specified params and returnswhen no UDFs called, and no UDFs available to be loaded | No UDFs on classpath, no UDF in schema | ||
36 | Tests UDF with URI params and returns | UDF with specified params and returns |
Prototype
UDF Jars: HAEMSLConversions.jar and UDFunctionProviderImpl.jar. Both extend UDFunctionProvider.jar.
MockDaffodil.jar contains a Scala app, that also contains a JAVA class that uses ServiceLoader. It needs UDFunctionProviderImpl.jar & UDFunctionProvider.jar
View file | ||||
---|---|---|---|---|
|
View file | ||||
---|---|---|---|---|
|
View file | ||||
---|---|---|---|---|
|
when UDFs called, but no UDFs loaded | No UDFs on classpath, UDF in schema | |
37 | Tests when UDF called with default namespace | Default namespace set to UDF namespaceURI; UDF calls with no prefix |
38 | Tests when exceptions thrown during loading UDFP | UDFP classes throws exception in class |
39 | Tests when exceptions thrown during loading UDFP’s UDF classes | UDFP throws exception in getUDFs function |
40 | Tests when exceptions thrown during loading UDF | UDF throws exception in class |
41 | Tests when custom exceptions thrown during evaluating (FatalError) | UDF throws exception in evaluate function |
42 | Tests when UDFProcessingError thrown during evaluating (ProcessingError) | UDF throws UDFProcessingError in evaluate function |
43 | Tests when UDF initializer returns object of wrong type | UDFP’s initialization function creates UDF object of different type |
Pull Requests
https://github.com/apache/incubator-daffodil/pull/273 - Initial Proposal
https://github.com/apache/incubator-daffodil/pull/279 - Final Product
...