A prior version of the proposal can be found here.
Table of Contents |
---|
Introduction
Some functionalities are either not supported or too difficult to express using the DPath expression language. While it is possible to add desired functionality to the codebase, some may be applicable to only a small dataset and won't have widespread use. It also leaves the burden of implementing said functionalities on the Daffodil developers. We would like Daffodil to be able to register and execute external/user defined functions (UDFs) in DPath expressions.
See
Jira | ||||||
---|---|---|---|---|---|---|
|
Use Cases/Examples
A trivial use case would be adding a replace function callable from a DPath expression. In the DFDL Schema, we might call something like the below; Where the function will return "Hello_World" if the element resolves to "Hello World".
Code Block |
---|
xmlns:sdf="com.ext.UDFunction.StringFunctions"
...
dfdl:choiceDispatchKey="{ sdf:replace(xs:string(../MessageTextFormatIdentifier), ' ', '_') }" |
...
Another use case would be implementing the normalization of elevation above Mean-Sea-Level (MSL) to Height-Above-Ellipsoid (HAE) for Link16F1 data. In the DFDL schema, we might call something like the below; where the functions will return the result of the conversion.
Code Block |
---|
xmlns:mhdf="http://extOther.UDFunction.ElevationConversions.com"
...
dfdl:outputValueCalc="{ mhdf:convert_to_hae(../elevation, ../hae_adjustment, ../scaling_factor) }" |
...
Code Block | ||
---|---|---|
| ||
@FunctionClassInfo( name = "convert_to_hae", namespace = "http://extOther.UDFunction.ElevationConversions.com" ) public class MSLConversions { public Float evaluate(Integer elevation_msl_raw, Integer hae_adjustment_raw, Float scaling_factor) { //implementation.. } } |
Requirements
- The UDF will be defined in a stand-alone library outside of the Daffodil codebase
- The UDF must be accessible to and callable from the Daffodil code
- Daffodil must be able to process and pass the return value from the UDF back to the Schema
- The support of UDFs in the DFDL Schema must be language agnostic and not Java, Scala or Daffodil specific
Proposed Solution
The Daffodil solution will use a combination of JAVA's ServiceLoader and Reflection APIs.
Daffodil Provided Jar
Daffodil will provide a UDFunctionProvider abstract class and an annotation class FunctionClassInfo.
...
The FunctionClassInfo annotation class must be applied and filled in for each function class. It provides name and namespace elements that must be filled out for the function class to be properly identified by the provider when queried by Daffodil .
UDF Implementation
The implementer will be expected to implement at least 2 classes: a provider class and at least one function class.
...
The function classes will contain the functionality of the UDF embodied in an evaluate method. This class is what will be initialized and returned by the provider class. The function class will be expected to implement an evaluate method as well as apply the Daffodil provided functionClassInfo annotation to the class . Because the parameter types and the return types of the evaluate function are dependent on the functionality, and we really only care about the name, we will not provide an abstract class for it. Each function that the implementer wishes to expose must be defined in a class with an evaluate function, and must be available to the lookup function. See Proposal: User Defined Functions Use Cases/Examples for sample function class.
Daffodil Service Loader
Daffodil will use the ServiceLoader API to poll for UDF Provider classes and return the desired function class on request.
...
Code Block | ||
---|---|---|
| ||
public Object lookupFunctionClass(String namespace, String name) { Object funcObj = null; try { UDFunctionProvider fp = this.functionProviderLookup.get(String.join("_", namespace, name)); if (fp != null ) { funcObj = fp.lookupFunctionClass(namespace, name); } } catch (Exception e) { e.printStackTrace(); } return funcObj; } |
Daffodil DPath Processing
Acquiring the UDFunctionClass
The Internal class referenced above will be instantiated, and its lookup function called with the namespace and name from functionQName object. If it returns null, we'll throw an error, otherwise we'll use Reflection API to query for the parameter type, return type and to find the evaluate function. The below implementation can also be distributed
Code Block | ||
---|---|---|
| ||
val functionClass = functionClassObject.getClass // infer argument types; this implementation supports overloading evaluate function val pTypes = argumentsFromDpath.map { o => o.getClass() } val fEvaluate = functionClass.getMethods.find { p => p.getName == "evaluate" && p.getParameterTypes.sameElements(pTypes) } val dResultType = NodeInfo.fromObject(fEvaluate.getReturnType) val dArgTypes = fEvaluate.getParameterType.map { NodeInfo.fromObject } |
Calling the UDFunctionClass
Within the DPath processing code, Daffodil will define 2 case classes, a UDFunctionCallExpr and a UDFunctionCall. The dResultType and dArgTypes arguments for UDFArgsListExpr will be inferred from the functionClass as shown above. The Expression functionObject will be set to the below.
...
Code Block | ||
---|---|---|
| ||
case class UDFunctionCallExpr(nameAsParsed: String, fnQName: RefQName, args: List[Expression], resultType: NodeInfo.Kind, argsTypes: List[NodeInfo.Kind], constructor: List[CompiledDPath] => RecipeOp) extends FunctionCallBase(nameAsParsed, fnQName, args) { lazy val argsToArgType = (args zip argsTypes).toMap override lazy val inherentType = resultType override def targetTypeForSubexpression(childExpr: Expression): NodeInfo.Kind = { argsToArgType.get(childExpr) match { case Some(argType) => argType case None => { Assert.invariantFailed("subexpression %s is not an argument.".format(subexp)) } } } override lazy val compiledDPath = { val recipes = args.map { _.compiledDPath } val res = new CompiledDPath(constructor(recipes) +: conversions) res } } case class UDFunctionCall(recipes: List[CompiledDPath], functionClassObject: Object, udfEvaluate: Method) extends FNArgsList(recipes) { override def computeValues(values: List[Any], dstate: DState) = { val res = udfEvaluate.invoke(functionClassObject, values: _*) //might need to explicitly cast values to type Object before hand res } } |
Diagnostics
We intend to supply the user will at least the following errors/warning
- Warning: No annotation present in function class(es) . Class of \[Provider_ClassName]. Class will be ignored.
- Error: No User Defined function class with specified name/namespace found. (List all names/namespaces with associated providers)
- Error: No evaluate function found in function class object
Testing
...
- Provider Checks
- No provider class
- No or empty functionClass list (i.e getFunctionClasses is empty)
- Function Class Checks
- No evaluate function
- Annotations Checks
- No FunctionClassInfo annotation
- Elements not filled in/empty
...
Focus | ID | Description | Test Data | |
---|---|---|---|---|
ServiceLoader API | 1 | Tests when there are no providers found by the ServiceLoader API due to missing or empty meta-inf file | No META-INF/Services/org.apache.daffodil.udf.UDFunctionProvider file on classpath of classLoader (CLI Test) | |
2 | Tests when there is an error thrown from ServiceLoader API | META-INF/Services/org.apache.daffodil.udf.UDFunctionProvider file contains typo in class name | ||
Provider Class | 3 | Tests when UDF Provider has no function classes | UDF with no call to setFunctionClasses initializing functionClasses to array of classes | |
4 | Tests when UDF Provider has empty function class | UDF with call to setFunctionClasses initializing functionClasses to empty array of classes | ||
Function Class | 6 |
| Tests when function classes don’t implement UserDefinedFunction interface | UDF with function class that doesn’t implement UserDefinedFunction interface |
7 | Tests when function classes don’t have annotations | UDF with function class that doesn’t have UserDefinedFunctionIdentification annotation | ||
8 | Tests when function classes have empty/invalid annotation fields | UDF with function class that has annotation function with empty fields | ||
9 | Tests when function classes have no evaluate function | UDF with function class whose method isn’t named “evaluate” | ||
10 | Tests when function can’t be found | Function call from schema with no non either non existent namespace or name | ||
Evaluate function | 11 | Tests when function class have overloaded evaluate function | UDF with overloaded evaluate function | |
12 | Tests when arguments number incorrect | Function call from schema with incorrect arg number | ||
13 | Tests when argument types incorrect | Function call from schema with incorrect arg type | ||
14 | Tests when argument types unsupported | Function call from schema with unsupported type (such as Calendar) | ||
15 | Tests when return type unsupported | UDF with unsupported return type such as Array of Arrays | ||
16 | Tests UDF with no args | UDF with no param and static return type | ||
17 | Tests UDF with no return type | UDF with void return type | ||
Primitive Arg/Return Types Testing | 18 | Tests UDF with primitive int params and returns | UDF with primitive params and return | |
19 | Tests UDF with primitive byte params and returns | UDF with primitive params and return | ||
20 | Tests UDF with primitive short params and returns | UDF with primitive params and return | ||
21 | Tests UDF with primitive long params and returns | UDF with primitive params and return | ||
22 | Tests UDF with primitive double params and returns | UDF with primitive params and return | ||
23 | Tests UDF with primitive float params and returns | UDF with primitive params and return | ||
24 | Tests UDF with primitive boolean params and returns | UDF with primitive params and return | ||
Boxed Args/Return Type Testing | 25 | Tests UDF with Boxed Integer params and returns | UDF with boxed params and return | |
26 | Tests UDF with Boxed Byte params and returns | UDF with boxed params and return | ||
27 | Tests UDF with Boxed Short params and returns | UDF with boxed params and return | ||
28 | Tests UDF with Boxed Long params and returns | UDF with boxed params and return | ||
29 | Tests UDF with Boxed Double params and returns | UDF with boxed params and return | ||
30 | Tests UDF with Boxed Float params and returns | UDF with boxed params and return | ||
31 | Tests UDF with Boxed Boolean params and returns | UDF with boxed params and return | ||
Other Param/Return Types | 32 | Tests UDF with Java Big Integer params and returns | UDF with specified params and returns | |
33 | Tests UDF with Java Big Decimal params and returns | UDF with specified params and returns | ||
34 | Tests UDF with String params and returns | UDF with specified params and returns | ||
35 | Tests UDF with Byte Array params and returns | UDF with specified params and returns | ||
36 | Tests UDF with URI params and returns | UDF with specified params and returns |
...
Prototype
UDF Jars: HAEMSLConversions.jar and UDFunctionProviderImpl.jar. Both extend UDFunctionProvider.jar.
...