You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

A prior version of the proposal can be found here.


Introduction

Some functionalities are either not supported or too difficult to express using the DPath expression language. While it is possible to add desired functionality to the codebase, some may be applicable to only a small dataset and won't have widespread use. It also leaves the burden of implementing said functionalities on the Daffodil developers. We would like Daffodil to be able to register and execute external/user defined functions (UDFs) in DPath expressions.

See  Unable to render Jira issues macro, execution error.  for additional information.

Use Cases/Examples

A trivial use case would be adding a replace function callable from a DPath expression. In the DFDL Schema, we might call  something like the below; Where the function will return "Hello_World" if the element resolves to "Hello World". 

xmlns:sdf="com.ext.UDFunction.StringFunctions"
...
dfdl:choiceDispatchKey="{ sdf:replace(xs:string(../MessageTextFormatIdentifier), ' ', '_') }"

The function class would look something like the below

@FunctionClassInfo(
	name = "replace", 
	namespace = "com.ext.UDFunction.StringFunctions" 
)
public class Replace {
	public String evaluate(String orig, String pre, String post) {
		//implementation...
	}
}


Another use case would be implementing the normalization of elevation above Mean-Sea-Level (MSL) to Height-Above-Ellipsoid (HAE) for Link16F1 data. In the DFDL schema, we might call something like the below; where the functions will return the result of the conversion.

xmlns:mhdf="http://extOther.UDFunction.ElevationConversions.com"
...
dfdl:outputValueCalc="{ mhdf:convert_to_hae(../elevation, ../hae_adjustment, ../scaling_factor) }"

The function class would look something like the below

@FunctionClassInfo(
	name = "convert_to_hae", 
	namespace = "http://extOther.UDFunction.ElevationConversions.com" 
)
public class MSLConversions {
	public Float evaluate(Integer elevation_msl_raw, Integer hae_adjustment_raw, Float scaling_factor) {
		//implementation..
	}
}

Requirements

  1. The UDF will be defined in a stand-alone library outside of the Daffodil codebase
  2. The UDF must be accessible to and callable from the Daffodil code
  3. Daffodil must be able to process and pass the return value from the UDF back to the Schema
  4. The support of UDFs in the DFDL Schema must be language agnostic and not Java, Scala or Daffodil specific

Proposed Solution

The Daffodil solution will use a combination of JAVA's ServiceLoader and Reflection APIs.

Daffodil Provided Jar

Daffodil will provide a UDFunctionProvider abstract class and an annotation class FunctionClassInfo.

The UDFunctionProvider class will have a private class array for all the function classes the provider is aware of, and its setter and getter functions, as well as a lookupFunctionClass method that will return a function class object. The implementer is expected to extend this class for each provider class they supply.

The FunctionClassInfo annotation class must be applied and filled in for each function class. It provides name and namespace elements that must be filled out for the function class to be properly identified by the provider when queried by Daffodil .

UDF Implementation

The implementer will be expected to implement at least 2 classes: a provider class and at least one function class.

The provider class will be an implementation of the Daffodil provided UDFuntionProvider class. It will contain a lookup function that returns an initialized object based on a supplied name and namespace. This class will act as a traditional service provider as explained in the ServiceLoader API, and should have a src/META-INF/services folder in its JAVA project that contains the fully qualified name of the UDFunctionProvider class that we supply. Through that file, this class will be made visible to the ServiceLoader API and the function class object can be returned to Daffodil. This class will also be expected to have access to all the function classes it's responsible for, maintain a list of these classes  and be able to initialize them as needed. This class can also provide state to each function class it is in charge of, for example, a database connection or perhaps a counter. A sample is provided below.

public class myUDFunctionProvider extends UDFunctionProvider {
	public myUDFunctionProvider() {
		// any state implementations can occur here...
		super.setFunctionClasses(new Class<?>[] { ns1_func1.class, ns2_func2.class, ns1_func3.class });
	}

	public Object lookupFunctionClass(String namespace, String name) {
		switch( String.join("_", namespace, name)) {
			case "ns1_func1":
				return new func1();
			case "ns2_func2":
				return new func2();
			case "ns1_func3":
				return new func3_randomName();
			default:
				return null;
		}
	}
}


The function classes  will contain the functionality of the UDF embodied in an evaluate method. This class is what will be initialized and returned by the provider class. The function class will be expected to implement an evaluate method as well as apply the Daffodil provided functionClassInfo annotation to the class . Because the parameter types and the return types of the evaluate function are dependent on the functionality, and we really only care about the name, we will not provide an abstract class for it. Each function that the implementer wishes to expose must be defined in a class with an evaluate function, and must be available to the lookup function. See Proposal: User Defined Functions for sample function class.

Daffodil Service Loader

Daffodil will use the ServiceLoader API to poll for UDF Provider classes and return the desired function class on request.

Daffodil will have an internal Java class that implements a function that uses the ServiceLoader iterator to aggregate all the provider classes, the function classes list they provide, and the lookupFunctionClass method they implement. This class will have its own function class list, which is aggregated from all function classes of all providers it is aware of. It will also have a map which links names and namespaces from each function class to its provider object. Its lookupFunctionClass method will use this map to lookup the right provider class to call to get the function class object. If no function class found, we will return null. This internal class will be implemented in Java as the ServiceLoader proved difficult to use in Scala.

public Object lookupFunctionClass(String namespace, String name) {
	Object funcObj = null;

	try {
		UDFunctionProvider fp = this.functionProviderLookup.get(String.join("_", namespace, name));
		if (fp != null ) {
			funcObj = fp.lookupFunctionClass(namespace, name);
		}
	} catch (Exception e) {
		e.printStackTrace();
	}
	return funcObj;
}

Daffodil DPath Processing

Acquiring the UDFunctionClass

The Internal class referenced above will be instantiated, and its lookup function called with the namespace and name from functionQName object. If it returns null, we'll throw an error, otherwise we'll use Reflection API to query for the parameter type, return type and to find the evaluate function. The below implementation can also be distributed

val functionClass = functionClassObject.getClass

// infer argument types; this implementation supports overloading evaluate function
val pTypes = argumentsFromDpath.map { o => o.getClass() }
val fEvaluate = functionClass.getMethods.find { p =>
	p.getName == "evaluate" && p.getParameterTypes.sameElements(pTypes)
}

val dResultType = NodeInfo.fromObject(fEvaluate.getReturnType)
val dArgTypes = fEvaluate.getParameterType.map { NodeInfo.fromObject }

Calling the UDFunctionClass

Within the DPath processing code, Daffodil will define 2 case classes, a UDFunctionCallExpr and a UDFunctionCall. The dResultType and dArgTypes arguments for UDFArgsListExpr will be inferred from the functionClass as shown above. The Expression functionObject will be set to the below.

UDFunctionCallExpr(functionQNameString, functionQName, args, dResultType, dArgTypes, UDFunctionCall(_, functionClassObject, fEvaluate))


UDFunctionCallExpr will extend Daffodil's FunctionCallBase, and override inherentType, targetTypeForSubexpression and compiledDPath. compiledDPath will call UDFunctionCall. UDFunctionCall will override computeValues calling fEvaluate using its invoke method.


case class UDFunctionCallExpr(nameAsParsed: String, fnQName: RefQName, args: List[Expression],
resultType: NodeInfo.Kind, argsTypes: List[NodeInfo.Kind], constructor: List[CompiledDPath] => RecipeOp)
	extends FunctionCallBase(nameAsParsed, fnQName, args) {

	lazy val argsToArgType = (args zip argsTypes).toMap
	override lazy val inherentType = resultType

	override def targetTypeForSubexpression(childExpr: Expression): NodeInfo.Kind = {
		argsToArgType.get(childExpr) match {
			case Some(argType) => argType
			case None => {
				Assert.invariantFailed("subexpression %s is not an argument.".format(subexp))
			}
		}
	}
	override lazy val compiledDPath = {
		val recipes = args.map { _.compiledDPath }
		val res = new CompiledDPath(constructor(recipes) +: conversions)
		res
	}
}

case class UDFunctionCall(recipes: List[CompiledDPath], functionClassObject: Object, udfEvaluate: Method)
	extends FNArgsList(recipes) {

	override def computeValues(values: List[Any], dstate: DState) = {
		val res = udfEvaluate.invoke(functionClassObject, values: _*) //might need to explicitly cast values to type Object before hand
		res
	}
}

Diagnostics

We intend to supply the user will at least the following errors/warning

  • Warning: No annotation present in function class(es) of \[Provider_ClassName]. Class will be ignored.
  • Error: No User Defined function class with specified name/namespace found. (List all names/namespaces with associated providers)
  • Error: No evaluate function found in function class object

Testing

FocusIDDescriptionTest DataExpected Result
ServiceLoader API

1Tests when there are no providers found by the ServiceLoader API due to missing or empty meta-inf fileNo META-INF/Services/org.apache.daffodil.udf.UDFunctionProvider file on classpath of classLoader (CLI Test)“No user defined functions found.” error
2Tests when there is an error thrown from ServiceLoader APIMETA-INF/Services/org.apache.daffodil.udf.UDFunctionProvider file contains typo in class name“ServiceConfigurationError” error
Provider Class

3Tests when UDF Provider has no function classesUDF with no call to setFunctionClasses initializing functionClasses to array of classesWarning: Provider Ignored...No function classes found
4Tests when UDF Provider has empty function classUDF with call to setFunctionClasses initializing functionClasses to empty array of classesWarning: Provider Ignored...No function classes found
5Tests when provider doesn’t implement UDFunctionProviderUDF with Provider that doesn’t extend UDFunctionProvider classProvider not “seen” by Daffodil
Function Class



6Tests when function classes don’t implement Serializable classUDF with function class that doesn’t implement Serializable interfaceWarning: Provider Ignored...FunctionClass(es) must implement java.io.Serializable
7Tests when function classes don’t have annotationsUDF with function class that doesn’t have function class annotationWarning: Provider Ignored...Annotations missing for FunctionClass(es)
8Tests when function classes have empty/invalid annotation fieldsUDF with function class that has annotation function with empty fieldsFunctionClass ignored: … “Annotation namespace field is empty or invalid.” and/or “Annotation namespace field is empty or invalid.”
9Tests when function classes have no evaluate functionUDF with function class whose method isn’t named “evaluate”“Missing evaluate method for function provided” error
10Tests when function can’t be foundFunction call from schema with no non either non existent namespace or name“Function not found” error
Evaluate function




11Tests when function class have overloaded evaluate functionUDF with overloaded evaluate function“Only one evaluate method allowed per function class” error
12Tests when arguments number incorrectFunction call from schema with incorrect arg numberIllegalArgumentException
13Tests when argument types incorrectFunction call from schema with incorrect arg typeIllegalArgumentException
14Tests when argument types unsupportedFunction call from schema with unsupported type (such as Calendar)“Unsupported object representation type” error
15Tests when return type unsupportedUDF with unsupported return type such as Array of Arrays“Unsupported object representation type” error
16Tests UDF with no argsUDF with no param and static return typeReturns value of static return

17Tests UDF with no return typeUDF with void return typeError: "Evaluate method for function provided cannot be void"
Primitive Arg/Return Types Testing





18Tests UDF with primitive int params and returnsUDF with primitive params and returnReturn specified type
19Tests UDF with primitive byte params and returnsUDF with primitive params and returnReturn specified type
20Tests UDF with primitive short params and returnsUDF with primitive params and returnReturn specified type
21Tests UDF with primitive long params and returnsUDF with primitive params and returnReturn specified type
22Tests UDF with primitive double params and returnsUDF with primitive params and returnReturn specified type
23Tests UDF with primitive float params and returnsUDF with primitive params and returnReturn specified type
24Tests UDF with primitive boolean params and returnsUDF with primitive params and returnReturn specified type
Boxed Args/Return Type Testing





25Tests UDF with Boxed Integer params and returnsUDF with boxed params and returnReturn specified type
26Tests UDF with Boxed Byte params and returnsUDF with boxed params and returnReturn specified type
27Tests UDF with Boxed Short params and returnsUDF with boxed params and returnReturn specified type
28Tests UDF with Boxed Long params and returnsUDF with boxed params and returnReturn specified type
29Tests UDF with Boxed Double params and returnsUDF with boxed params and returnReturn specified type
30Tests UDF with Boxed Float params and returnsUDF with boxed params and returnReturn specified type
31Tests UDF with Boxed Boolean params and returnsUDF with boxed params and returnReturn specified type
Other Param/Return Types



32Tests UDF with Java Big Integer params and returnsUDF with specified params and returnsReturn specified type
33Tests UDF with Java Big Decimal params and returnsUDF with specified params and returnsReturn specified type
34Tests UDF with String params and returnsUDF with specified params and returnsReturn specified type
35Tests UDF with Byte Array params and returnsUDF with specified params and returnsReturn specified type
36Tests UDF with URI params and returnsUDF with specified params and returnsReturn specified type

Prototype

UDF Jars: HAEMSLConversions.jar and UDFunctionProviderImpl.jar. Both extend UDFunctionProvider.jar.

MockDaffodil.jar contains a Scala app, that also contains a JAVA class that uses ServiceLoader. It needs UDFunctionProviderImpl.jar & UDFunctionProvider.jar


HAEMSLConversions.jarMockDaffodil.jarUDFunctionProvider.jarUDFunctionProviderImpl.jar

  • No labels