This proposal has been superseded by a new proposal.
Introduction
Some functionalities are either not supported or too difficult to express using the DPath expression language. While it is possible to add desired functionality to the codebase, some may be applicable to only a small dataset and won't have widespread use. It also leaves the burden of implementing said functionalities on the Daffodil developers. We would like Daffodil to be able to register and execute external/user defined functions (UDFs) in DPath expressions.
See
Jira | ||||||
---|---|---|---|---|---|---|
|
Use Cases/Examples
A trivial use case would be adding a replace function callable from a DPath expression. In the DFDL Schema, we might call something like the below; Where the function will return "Hello_World" if the element resolves to "Hello World".
...
Code Block |
---|
dfdl:outputValueCalc="{ mhdf:convert_to_hae( xs:int(../elevation), xs:int(../hae_adjustment), xs:float(../scaling_factor)) }" |
Requirements
- The UDF will be defined in a stand-alone library outside of the Daffodil codebase
- The UDF must be accessible to and callable from the Daffodil code
- Daffodil must be able to process and pass the return value from the UDF back to the Schema
- The support of UDFs in the DFDL Schema must be language agnostic and not Java, Scala or Daffodil specific
Proposed Solution
The Daffodil solution will use a combination of JAVA's ServiceLoader and Reflection APIs.
UDF Implementation
The implementer will be expected to implement at least 2 classes. One will be a "provider" class containing a lookup function, that returns an initialized object based on a supplied name and namespace. This "provider" class will act as a traditional service provider as explained in the ServiceLoader API, and should have a src/META-INF/services folder in its JAVA project that contains the fully qualified name of the UDFunctionProvider class that we supply. Through this, this class will be made visible to the ServiceLoader API and the function class object can be returned to Daffodil. This class will also be expected to have access to all the function classes it's responsible for, and be able to initialize them as needed.
...
Code Block | ||
---|---|---|
| ||
getName: String getNamespace: String evaluate: Any //Return type is function dependent, but must be called evaluate |
Daffodil Service Loader
Daffodil will use the ServiceLoader API to poll for UDF Provider classes and return the desired function class on request.
...
Code Block | ||
---|---|---|
| ||
public Object lookupFunctionClass(String namespace, String name) { Object funcObj = null; try { for (UDFunctionProvider fp : loader) { funcObj = fp.lookupFunction(namespace, name); if (funcObj != null) { break; } } } catch (Exception e) { e.printStackTrace(); } return funcObj; } |
Daffodil DPath Processing
Acquiring the UDFunctionClass
The Internal class referenced above will be instantiated, and its lookup function called with the namespace and name from functionQName object. If it returns null, we'll throw an error, otherwise we'll use Reflection API to query for the parameter type, return type and to find the evaluate function. The below implementation can also be distributed
Code Block | ||
---|---|---|
| ||
val functionClass = functionClassObject.getClass // infer argument types; this implementation supports overloading evaluate function val pTypes = argumentsFromDpath.map { o => o.getClass() } val fEvaluate = functionClass.getMethods.find { p => p.getName == "evaluate" && p.getParameterTypes.sameElements(pTypes) } val dResultType = NodeInfo.fromObject(fEvaluate.getReturnType) val dArgTypes = fEvaluate.getParameterType.map { NodeInfo.fromObject } |
Calling the UDFunctionClass
Within the DPath processing code, Daffodil will define 2 case classes, a UDFunctionCallExpr and a UDFunctionCall. The dResultType and dArgTypes arguments for UDFArgsListExpr will be inferred from the functionClass as shown above. The Expression functionObject will be set to the below.
...
Code Block | ||
---|---|---|
| ||
case class UDFunctionCallExpr(nameAsParsed: String, fnQName: RefQName, args: List[Expression], resultType: NodeInfo.Kind, argsTypes: List[NodeInfo.Kind], constructor: List[CompiledDPath] => RecipeOp) extends FunctionCallBase(nameAsParsed, fnQName, args) { lazy val argsToArgType = (args zip argsTypes).toMap override lazy val inherentType = resultType override def targetTypeForSubexpression(childExpr: Expression): NodeInfo.Kind = { argsToArgType.get(childExpr) match { case Some(argType) => argType case None => { Assert.invariantFailed("subexpression %s is not an argument.".format(subexp)) } } } override lazy val compiledDPath = { val recipes = args.map { _.compiledDPath } val res = new CompiledDPath(constructor(recipes) +: conversions) res } } case class UDFunctionCall(recipes: List[CompiledDPath], functionClassObject: Object, udfEvaluate: Method) extends FNArgsList(recipes) { override def computeValues(values: List[Any], dstate: DState) = { val res = udfEvaluate.invoke(functionClassObject, values: _*) //might need to explicitly cast values to type Object before hand res } } |
Prototype
UDFunctionProviderImpl.jar needs UDFunctionProvider.jar
...