This proposal has been superseded by a new proposal.

Introduction

Some functionalities are either not supported or too difficult to express using the DPath expression language. While it is possible to add desired functionality to the codebase, some may be applicable to only a small dataset and won't have widespread use. It also leaves the burden of implementing said functionalities on the Daffodil developers. We would like Daffodil to be able to register and execute external/user defined functions (UDFs) in DPath expressions.

See  Unable to render Jira issues macro, execution error.  for additional information.

Use Cases/Examples

A trivial use case would be adding a replace function callable from a DPath expression. In the DFDL Schema, we might call  something like the below; Where the function will return "Hello_World" if the element resolves to "Hello World". 

dfdl:choiceDispatchKey="{ sdf:replace(xs:string(../MessageTextFormatIdentifier), ' ', '_') }"

Another use case would be implementing normalizing elevation above Mean-Sea-Level (MSL) to Height-Above-Ellipsoid (HAE) for Link16F1 data. In the DFDL schema, we might call something like the below; where the functions will return the result of the conversion.

dfdl:outputValueCalc="{ mhdf:convert_to_hae(
	xs:int(../elevation), 
	xs:int(../hae_adjustment), 
	xs:float(../scaling_factor)) }"

Requirements

  1. The UDF will be defined in a stand-alone library outside of the Daffodil codebase
  2. The UDF must be accessible to and callable from the Daffodil code
  3. Daffodil must be able to process and pass the return value from the UDF back to the Schema
  4. The support of UDFs in the DFDL Schema must be language agnostic and not Java, Scala or Daffodil specific

Proposed Solution

The Daffodil solution will use a combination of JAVA's ServiceLoader and Reflection APIs.

UDF Implementation

The implementer will be expected to implement at least 2 classes. One will be a "provider" class containing a lookup function, that returns an initialized object based on a supplied name and namespace. This "provider" class will act as a traditional service provider as explained in the ServiceLoader API, and should have a src/META-INF/services folder in its JAVA project that contains the fully qualified name of the UDFunctionProvider class that we supply. Through this, this class will be made visible to the ServiceLoader API and the function class object can be returned to Daffodil.  This class will also be expected to have access to all the function classes it's responsible for, and be able to initialize them as needed.

public class myUDFunctionProvider extends UDFunctionProvider {
	public Objext lookupFunctionClass(String namespace, String name) {
		switch( String.join("_", namespace, name)) {
			case "ns1_func1":
				return new func1();
			case "ns2_func2":
				return new func2();
			case "ns1_func3":
				return new func3_randomName();
			default:
				return null;
		}
	}
}


The rest of the classes will contain the functionality of the UDF embodied in an evaluate method. This class is what will be initialized and returned by the provider class. This UDF class will be expected to implement 3 methods as seen below. Because the parameter types and the return types of the evaluate function are dependent on the functionality, and we really only care about the name, we will not provide an abstract class for it. Each function that the implementer wishes to expose must be defined in a class with an evaluate function, and must be available to the lookup function.

getName: String
getNamespace: String
evaluate: Any //Return type is function dependent, but must be called evaluate

Daffodil Service Loader

Daffodil will use the ServiceLoader API to poll for UDF Provider classes and return the desired function class on request.

As mentioned above, we will provide an abstract class for UDF "Provider" class. It will contain one abstract lookup function class method . This class will be externally available for the implementer to use. Daffodil will  also have an internal class that will implement a function that calls the ServiceLoader iterator, and call the lookupFunctionClass method previously mentioned. If not found, we will return null, else we will return the desired object. This internal class will be implemented in Java as the ServiceLoader proved difficult to use in Scala.

public Object lookupFunctionClass(String namespace, String name) {
	Object funcObj = null;
	try {
		for (UDFunctionProvider fp : loader) {
			funcObj = fp.lookupFunction(namespace, name);
			if (funcObj != null) {
				break;
			}
		}
	} catch (Exception e) {
		e.printStackTrace();
	}
	return funcObj;
}

Daffodil DPath Processing

Acquiring the UDFunctionClass

The Internal class referenced above will be instantiated, and its lookup function called with the namespace and name from functionQName object. If it returns null, we'll throw an error, otherwise we'll use Reflection API to query for the parameter type, return type and to find the evaluate function. The below implementation can also be distributed

val functionClass = functionClassObject.getClass

// infer argument types; this implementation supports overloading evaluate function
val pTypes = argumentsFromDpath.map { o => o.getClass() }
val fEvaluate = functionClass.getMethods.find { p =>
	p.getName == "evaluate" && p.getParameterTypes.sameElements(pTypes)
}

val dResultType = NodeInfo.fromObject(fEvaluate.getReturnType)
val dArgTypes = fEvaluate.getParameterType.map { NodeInfo.fromObject }

Calling the UDFunctionClass

Within the DPath processing code, Daffodil will define 2 case classes, a UDFunctionCallExpr and a UDFunctionCall. The dResultType and dArgTypes arguments for UDFArgsListExpr will be inferred from the functionClass as shown above. The Expression functionObject will be set to the below.

UDFunctionCallExpr(functionQNameString, functionQName, args, dResultType, dArgTypes, UDFunctionCall(_, functionClassObject, fEvaluate))


UDFunctionCallExpr will extend Daffodil's FunctionCallBase, and override inherentType, targetTypeForSubexpression and compiledDPath. compiledDPath will call UDFunctionCall. UDFunctionCall will override computeValues calling fEvaluate using its invoke method.


case class UDFunctionCallExpr(nameAsParsed: String, fnQName: RefQName, args: List[Expression],
resultType: NodeInfo.Kind, argsTypes: List[NodeInfo.Kind], constructor: List[CompiledDPath] => RecipeOp)
	extends FunctionCallBase(nameAsParsed, fnQName, args) {

	lazy val argsToArgType = (args zip argsTypes).toMap
	override lazy val inherentType = resultType

	override def targetTypeForSubexpression(childExpr: Expression): NodeInfo.Kind = {
		argsToArgType.get(childExpr) match {
			case Some(argType) => argType
			case None => {
				Assert.invariantFailed("subexpression %s is not an argument.".format(subexp))
			}
		}
	}
	override lazy val compiledDPath = {
		val recipes = args.map { _.compiledDPath }
		val res = new CompiledDPath(constructor(recipes) +: conversions)
		res
	}
}

case class UDFunctionCall(recipes: List[CompiledDPath], functionClassObject: Object, udfEvaluate: Method)
	extends FNArgsList(recipes) {

	override def computeValues(values: List[Any], dstate: DState) = {
		val res = udfEvaluate.invoke(functionClassObject, values: _*) //might need to explicitly cast values to type Object before hand
		res
	}
}

Prototype

UDFunctionProviderImpl.jar needs UDFunctionProvider.jar

MockDaffodil.jar needs UDFunctionProviderImpl.jar & UDFunctionProvider.jar

MockDaffodil is a Scala app, that also contains a JAVA class that uses ServiceLoader

UDFunctionProvider.jarUDFunctionProviderImpl.jarMockDaffodil.jar

  • No labels

4 Comments

  1. The code seems to infer the argument types, but the examples all have explicit DPath cast functions being called. Are those xs:float(...) type argument conversion calls actually required, or will the DPath implementation know the arg type of the function so that it can insure the argument has that type as it does for built-in functions?

    I ask because a goal for any UDF facility in any system is that once defined, a UDF call is as indistinguishable from a built-in function as possible.

    1. The cast arguments aren't necessary. I just put them in there so the reader would know what the types were. The code will be expected to infer the types of the arguments and then use those to correctly identify the evaluate function via Reflection. The current implementation infers the arg types by calling getClass on each argument.

  2. Design-for-Test:

    When this code gets to code review with tests one of the key things will be that we get decent diagnostics from this for all the things users can do wrong in creating the UDF jar, such as not having it be found, not having all the classes in it, not having them named right or derived properly, etc. etc. First cut is errors should be throwing a Daffodil-defined exception object, which encapsulates whatever exception object the actual underlying UDF reflection code or Service API call throws. 

    Regression testing this UDF facility will require some trickery, as some tests will not be ordinary JUnit-style tests, as incorrectly-constructed jars have to be created and used.

    I recommend that we add some "official" UDFs to daffodil that are always part of the standard build, not because they are useful, but because they allow us to test the UDF system.

    An important test case is also to define such a UDF which throws an exception, so that we can have tests that verify the exception is properly caught and reported.

  3. Our service provider only offers a lookup function. We might also want it to offer a way of querying a list of all defined functions.