Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This proposal is replaced by Proposal: Simplified Feature to Support Enumerations

Simplification is due to these features being very complex, and thus far mostly unused in DFDL schemas. 

However, this proposal does reflect reflects the feature as implemented in Daffodil 3.2.0. 4.0. A prior proposal is available here.(Substantial parts have existed since Daffodil 2.4.0) several of which are in use by Daffodil users, and so are assumed to be carried forward in future Daffodil releases. 

A prior proposal is available here.

Introduction

Much data contains numeric values that are enumerations, where each value is associated with a logical string the provides a meaningful symbolic interpretation of it.

...

This proposal provides an alternative mechanism by introducing a new notion to dfdlx:inputTypeCalc and outputTypeCalc which are analogous to inputValueCalc and outputValueCalc except that they are associated with types, not elements; and that they compose with preexisting parsing behavioursbehaviors.

Using this notion, this proposal will then introduce a specific construct, KeySet-Value maps, to allow an efficient implemtation implementation of enum lookups using the inputTypeCalc and outputTypeCalc concepts.

This proposal will then provide some additional constructs to support a wider array of use cases and discuss how it can be integrated with other DFDL features; particuarly particularly xs:choice elements and InputValueCalc/OutputValueCalc.

Theory

Before discussion the concrete implementation, it is worth considering the theoretical structure that is being proposed abstractly.

...

The above example defining t2[B] would suggest that the domain of inputTypeCalc is all of A, which is all logical values associated with the repType t1[A]. However, it is often useful for inputTypeCalc to be defined for only a subset of A. For instance, suppose t1[Int] is all 8-bit unsigned integers representing error codes, and t2[String] is a human readable description of said codes. If there are only 100 codes defined, then it might make sense to define inputTypeCalc over only values 0-99. Similarly, outputTypeCalc need not be defined for all strings, just those which may be returned by inputTypeCalc (although it might be desirable to define outputTypeCalc over a broader domain to better support edited infosets). To support this, we allow inputTypeCalc and outputTypeCalc to be partial functions, and refer to the domain of inputTypeCalc as the repValues of t2.

Representing Transforms

This proposal does not actually specify transforms independently, but as part of the specification of a new type.

Identity Transform

Suppose we have an existing type t1[A] and we want to define a new type, t2[A] with the trivial identity transforms. We may do this by defining t1 as a new xsd simpleType with base A, and add the dfdlx:repType annotation to specify the repType as t2.

Code Block
<xs:simpleType name='t2' dfdlx:repType=t1>'t1'>
  <xs:restriction base='A' />
</xs:simpleType>

This is not particularly useful, but will serve as a base for more complicated transforms.

Restriction Transform

A less pointless variant of the identity transform is the restriction transform.  The restriction transform behaves like the identity transform except it restricts the set of repValues.

Code Block
<xs:simpleType name='t2' dfdlx:repType=t1>'t1'>
  <xs:restriction base=A>'A'>
    <xs:minInclusive=”1”/>
    <xs:maxInclusive=”10”/>
  </xs:restriction>
</xs:simpleType>

As you can see, we accomplish this using the standard xsd restriction feature. This has the added benefit that non DFDL aware xml validators will automatically be aware of the restriction on the legal values of the resulting type.

KeySet-Value Transform

The KeySet-Value transforms are central to the support of enumerations. Abstractly, a KeySet-Value transform is defined by a set of (keyset, canonicalKey, value) tuples, where each canonicalKey is a member of the corresponding keyset, all values are unique, and all keysets are mutually disjoint. The transform is then defined by:

...

This is specified in schema by definng t2[B] as an xsd enumeration of type B. On each enumeration value, we use DFDL annotations to specify one or more keys (or repValues) to associate with it. There are two ways to specify repValues. The dfdlx:repValues annotation is a space deliminated delimited list of values; and the dfdlx:repValueRanges is a space separated list of ints which will be interperated interpreted as “min1 max2 min2 max2 … minN maxN”, which represents the union of all intervals [minK, maxK]. The repValue set of t2 is the union of that specified by the above to methods. For example:

Code Block
languagexml
<xs:simpleType name="fruitEnumType" dfdlx:repType="tns:fruitRepType">
  <xs:restriction base="xs:string">
    <xs:enumeration value="Apple" dfdlx:repValues="0" />
    <xs:enumeration value="Banana" dfdlx:repValues="1" />
    <xs:enumeration value="Disused" dfdlx:repValues="11 13 15" />
    <xs:enumeration value="Illegal" 
      dfdlx:repValues="12 14" 
      dfdlx:repValueRanges=”3 10 16 255”/>
  </xs:restriction>
</xs:simpleType>

The canonical repValue is the first value specified by dfdlx:repValues, or (of if dfdlx:repValues is not present), the first value specified by dfdlx:repValueRanges.

Union Transfom

Suppose we have multiple types using a common repType, but with disjoint repValues. For instance, we might have a separate type for negative integers and non-negative integers. We can combine these into a single type using the xsd union construct:

...

Here, we require the the repType of all component types match the repType of the parent type. The repValues of the parent type is the disjoint union of the repValues of the child types, and the inputTypeCalc/outputTypeCalc functions are defined piecewise by those of the component functions.

Expression Transform

The final type of transform that this proposal will consider are those defined by arbitrary DFDL expressions. These expressions will be defined by means of explicit dfdlx:inputTypeCalc and dfdlx:outputTypeCalc annotations on the type. In addition, the repValue set must be explicitly defined by placing dfdlx:repValues and/or dfdlx:repValueRanges directly on the type.

Code Block
<xs:simpleType name="fruitLocalType"
     dfdlx:inputValueCalcinputTypeCalc ="{ dfdlx:repTypeValue() – 2 }"
     dfdlx:ouputValueCalcoutputTypeCalc ="{ dfdlx:logicalTypeValue() + 2 }" 
     dfdlx:repType=”tns:fruitIntType”
     dfdlx:repValues="12 14"
     dfdlx:repValueRanges="3 10 16 255" >
  <xs:restriction base=”xs:int” />
</xs:simpleType>

Note that, in the above example, a non DFDL aware validator will mistakingly mistakenly believe that all integers are legal values. This can be resolved by explicitly specifying the set of logical values using the xsd restriction mechanism:

Code Block
<xs:simpleType name="fruitLocalType" 
    dfdlx:inputTransforminputTypeCalc="{ dfdlx:repTypeValue() - 2 }" 
    dfdlx:outputTransformoutputTypeCalc="{ dfdlx:logicalTypeValue() + 2 }                  dfdlx:repValues="12 14" dfdlx:repValueRanges="3 10 16 255" >
  <xs:union>
    <xs:simpleType>
      <xs:restriction base="xs:int ">
        <xs:enumeration value="10"/>
        <xs:enumeration value="12"/>
      </xs:restriction>
    </xs:simpleType>
    <xs:simpleType>
      <xs:restriction base="xs:int">
        <xs:minInclusive="1"/>
        <xs:maxInclusive="8"/>
      </xs:restriction>
    </xs:simpleType>
    <xs:simpleType>
      <xs:restriction base="xs:int">
        <xs:minInclusive="14"/>
        <xs:maxInclusive="253"/>
      </xs:restriction>
    </xs:simpleType>
  </xs:union>
</xs:simpleType>

Note that the only effect of adding these restrictions on the logical type is in validation.

Interaction with xs:choice

It may be desirable to select a different transform based on the value encountered at runtime. This is possible using the above mentioned union transform, however that solution requires that all transforms result in the same element, thereby hiding information of which case was used in the generated infoset. Additionally, such a method would not allow the distinct transforms to have different output types.

As an alternative, we add two annotations to xs:choice: dfdlx:choiceBranckKeyKind, and dfdlx:choiceDispathKeyKindchoiceDispatchKeyKind

When choiceBranckKeyKind is “byType” each branch of the xs:choice must be a simple element with a transform. The choice will then behave as if the each element specified dfdlx:choiceBranchKey as the set of repValues defined by the type of said element.

When dfdlx:choiceDispathKeyKind choiceDispatchKeyKind is “byType”, we require all choice options to be simple elements and which share a common repType. We then parse the repType, and use the resulting simple value as the choiceDispatchKey.

For example:

Code Block
<xs:choice 
  dfdlx:choiceBranckKeyKindchoiceBranchKeyKind=”byType” 
  dfdlx:choiceDispathKeyKindchoiceDispatchKeyKind=”byType”>
  <xs:element name=”fruit” type=”tns:fruitEnumType”/>
  <xs:element name=”localFruit” type=”tns:fruitLocalType”/>
  <xs:element name=”disused” type=”tns:fruitDisusedType”/>
</xs:choice>

...

In this case, the binary input 02 would parse to <two>1 or 2</two>. However, it is ambiguous if we should unparse this according to the canonical value of the type (1), or the canonical branchKey (2).

Using with explicit raw elements

It may be desirable to include both the raw and logical values in the infosets. Traditionally, this usecase use case has been accomplished using inputValueCalc and outputValueCalc annotations. This remains the case here. To support this usecaseuse case, we expose the inputTypeCalc/outputTypeCalc functions to the DFDL expression language:

Code Block
<xs:sequence >sequence>
  <xs:element name="raw" type="tns:fruitRepType" dfdlx
    dfdlx:outputValueCalc=”dfdlx:outputTypeCalc(tns:fruitEnumType, ../fruit)”/>
  <xs:element name=”fruit” type=”tns:fruitEnumType” 
    dfdlx:inputValueCalc=”dfdlx:inputTypeCalc(tns:fruitRepType, ../raw)”/>
</xs:sequence>

...

Code Block
<xs:sequence>
  <xs:element name="raw" type="tns:fruitIntType" 
    dfdlx:outputValueCalc="dfdlx:outputTypeCalcNextSibling()" />
  <xs:choice dfdlx:choiceBranchKeyKind="byType" 
    dfdlx:choiceDispatchKeyKind=”explicit” dfdlx:choiceDispatchKey="../raw" >
    <xs:element name="fruit" type="tns:fruitType" 
      dfdlx:inputValueCalc="dfdlx:inputTypeCalc(tns:fruitType, ../raw)" />
    <xs:element name="localFruit" type="tns:fruitLocalType" 
      dfdlx:inputValueCalc="dfdlx:inputTypeCalc(tns:fruitLocalType, ../raw)" />
    <xs:element name="disused" type="tns:fruitDisuedTypefruitDisusedType" 
      dfdlx:inputValueCalc="dfdlx:inputTypeCalc(tns:fruitDisusedType, ../raw)" />
  <xs:choice>
</xs:sequence>

In principle, this could be accomplished more generically, by allowing dfdlx:outputTypeCalc to take an arbitrary expression returning a path to a node, along with some form of next-sibling function (to allow for the fact that there is not a constant name for the next sibling). However, due to ease of implementation, only this more limited structure will be supported by this proposal.

Summary of annotations

  • dfdlx:repType
    • Applies to xs:simpleType
    • Defines the representation type associated with the annotated type.
    • On parse, the DFDL processor first parses according to the repType, then applies any conversion specified by the annotated type.
    • On unparse, the DFDL processor first applies the conversion specified by the annotated type, then the unparse behavior specified by the repType
  • dfdlx:choiceBranchKeyKind
    • Applies to xs:choice
    • Values: byType, explicit, speculative, implicit
    • byType
      • Each choice option must be a simple element
      • All choice options must have a type with a common repType
      • The valueSets of all options must be mutually disjoint
      • The choice dispatch will behave as if the choiceBranchKeys specified by an option are the valueSet of the options type.
    • Explicit
      • Each choice option must directly specify a choiceBranchKey. These values will be used for direct dispatch
      • Requires choiceDispatchKeyKind=explicit as well
    • Speculative
      • Direct dispatch will not be used. Choice options will be parsed speculatively, and the first non-failing case will be used
      • Requires choiceDispatchKeyKind=speculative
    • Implicit
      • Current behavior
      • If choice options provide explicit choiceBranchKeys, then behave as if we were “explicit”
      • Otherwise, behave as if we were “speculative”
  • dfdlx:choiceDispatchKeyKind
    • Applies to xs:choice
    • Values: byType, explicit, speculative, implicit
    • byType
      • Each choice option must be a simple element
      • All choice options must have a type with a common repType
      • First, parse according to the common repType without consuming any input
      • Then, use the resulting value as the choiceDispatchKey
    • Explicit
      • Gets the choiceDispatchKey from the dfdlx:choiceDispatchKey annotation
    • Speculative
      • Direct dispatch will not be used. Choice options will be parsed speculatively, and the first non-failing case will be used
      • Requires choiceBranchKeyKind=speculative
    • Implicit
      • Current behavior
      • If dfdlx:choiceDispatchKey is present, them behave as if we were explicit
      • Otherwise, behave as if we were speculative
  • dfdlx:inputTypeCalc
    • Applies to xs:simpleType
    • Requires dfdlx:repType to also be present
    • Is a DFDL expression
    • On parse, first parse according to the repType, then populate the value of this element to the result of evaluating the dfdlx:inputTypeCalc expression
    • The value of the repType may be accessed by the expression through the dfdlx:repTypeValue functions
  • dfdlx:ouputTypeCalcoutputTypeCalc
    • Applies to xs:simpleType
    • Requires dfdlx:repType to also be present
    • Is a DFDL expression
    • On unparse, first evaluate this expression, then unparse according to the repType as if the logical value were the result of evaluating this expression
    • The original logical value of this type may be accessed by the expression through the dfdlx:logicalTypeValue functions
  • dfdlx:repValues
    • Applies to xs:enumeration and xs:simpleType
    • A space separated list of values
    • Values must be of a type consistend with repType
    • When applied to xs:enumeration:
      • Defines a KeySet-Value transform, and associates the annotated enumeration value with the listed keys
      • Adds the listed values to the repValue set of the parent simpleType
    • When Applied to xs:simpleType
      • Adds the listed keys to the repValue set of the parent
      • This set will be used by xs:choice when choiceBranchKeyKind=byType
  • dfdlx:repValueRanges
    • Applies to xs:enumeration and xs:simpleType
    • Requires dfdlx:repType to be present and refer to an integer type
    • A space separated list of integers defining ranges of integers
    • Takes the form “min1 max1 min2 max2 … minN maxN”
    • Represents the set of integers described by the union of the intervals [mink, maxK]
    • Behaves as if all members of this set were included in the dfdlx:repValues annotation

Summary of Functions

    • the union of the intervals [mink, maxK]
    • Behaves as if all members of this set were included in the dfdlx:repValues annotation

Summary of Functions

  • dfdlx:inputTypeCalc
  • dfdlx:inputTypeCalc(f: QName, x:A)
    • f must be a constant QName resolving to a simpleType with a transform defined
    • The type of x is determined statically at compile time as the primitive type of the repType of f.
    • The return type is given by the primitive type of the logical type of f.
    • If the types given by f do not match what is required, the relevent expression may be cast according to standard DFDL expression casting rules.
    • Returns the result of applying the inputTypeCalc function associated with f to x
  • dfdlx:outputTypeCalc(f: QName, x:AnyA)
    • f must be a constant QName resolving to a simpleType with a transform defined
    • The type of x is determined statically at compile time as the primitive type of the logical type repType of f.
    • The return type is given by the primitive type of the repType logical type of f.
    • If the types given by f do not match what is required, the relevent expression may be cast according to standard DFDL expression casting rules.
    • Returns the result of applying the outputTypeCalc inputTypeCalc function associated with f to x
  • dfdlx:outputTypeCalcNextSiblingIntoutputTypeCalc(f: QName, x:Any)
    • The following sibling f must be a constant QName resolving to a simpleType whose repType is a restriction of xs:int
    • Returns the result of applying the outputTypeCalc function associated with the type of the following element to the value of the following element
    dfdlx:outputTypeCalcNextSibling()
    • with a transform defined
    • The type of x is determined statically at compile time as the primitive type of the logical type of f.
    • The return type is given by
    • Returns the result of applying the outputTypeCalc defined by the immidietly following sibling of the current node, to the value contained by the immidietly following sibling of the current node.
    • Can only be used on unparse.
    • Requires that all potential following siblings have the same repType and define an outputTypeCalc.
    • The return type is the primitive type of the repType of the following siblingf.
    • If the return type does types given by f do not match what is expected by required, the containing relevent expression , it will may be cast according to standard DFDL expression casting rules.
    dfdlx:repTypeValue()
    • Can only be called from inside dfdlx:inputTransform
    • Returns the value result of the underlying repType.
    dfdlx:logicalTypeValue()
    • Can only be called from inside dfdlx:outputTransform
    • Returns the logical value of this element.applying the outputTypeCalc function associated with f to x