This is a simplification using only a small subset of the features from a more elaborate proposal:  Proposal: (Superceded) Feature to support enumerations and typeValueCalc

Also there are sections below on Prior Features Removed, and Prior Features Retained but Deprecated

Introduction

This page describes the feature for enumerations in Daffodil (to be proposed for DFDL v2.0), which is a simplification of the prior generation experimental features.

Some aspects of the prior-generation experimental features are in use in important DFDL schemas by Daffodil user communities and will be retained for the forseeable future, but eventually should be deprecated.

Properties dfdlx:repType and dfdlx:repValues

The way enumerations are done requires use of new properties dfdlx:repType and dfdlx:repValues as seen below:

<xs:simpleType name="vehicleType" dfdlx:repType="tns:uint3">
   <xs:restriction base="xs:string">
     <xs:enumeration value="NoStatement" dfdlx:repValues="0"/> 
     <xs:enumeration value="truck"       dfdlx:repValues="1" />
     <xs:enumeration value="suv"         dfdlx:repValues="2" />
     <xs:enumeration value="bus"         dfdlx:repValues="3" />
     <xs:enumeration value="train"       dfdlx:repValues="4" />
     <xs:enumeration value="car"         dfdlx:repValues="5" />
     <!-- ILLEGAL 6 -->
     <!-- ILLEGAL 7 -->
  </xs:restriction>
</xs:simpleType>
 
<xs:simpleType name="uint3" dfdl:length="3" dfdl:lengthUnits="bits">
  <xs:restriction base="xs:unsignedInt"/>
</xs:simpleType>

The use of dfdlx:repValues is optional and if not present, the implied behavior should be as if it was used as illustrated above with the first enumeration value being 0 and subsequent being increments from there.

Note that there must  be one enumeration value for every possible value of the representation type integer.

The comments in the above noting that values 6 and 7 are illegal are not visible to Daffodil. They are merely comments. 

An attempt to parse an unmapped integer should result in the string ILLEGAL_N where N is the unmapped integer value. This results in such a value being considered well-formed, but as it does not match a value in the enumerations allowed, it will be considered invalid. 

The prefix string such as "ILLEGAL" shown here should be configurable via a Daffodil tunable, with value "ILLEGAL" as the default.

Actual enumeration values named "ILLEGAL" or prefixed  that way are not an error. 

Unparsing requires valid data input. Hence an attempt to unparse a string such as "ILLEGAL_7" fails  ith a processing error. 

All the dfdx:repValues integers must be distinct.

More than one dfdx:repValues  integer may be specified. On unparsing the first is used. 

All the enumeration strings must be distinct.

Only mappings between non-negative integers and strings (when parsing, the opposite direction for unparsing) are supported. 

The upper bound on the size of the enumerations is 16 bits (64K enumeration values), but may be enlarged in the future if needed. (Largest known as of this writing is 4096 entries - 12 bits)

The runtime implementation uses an array to map from integers to strings, and a hash table-like technique to map from strings back to integers, so as to achieve constant time for parse and unparse for each such enumerated-value element. 

Implementation Note:

This feature is already implemented as of Daffodil 2.4.0 with exceptions for:

  • the ability for the dfdlx:repValues property to be omitted and its value implied
    • schemas should always add the dfdlx:repValues property for now.
  • the ability to synthesize ILLEGAL_N from an unmapped integer.
    • Currently a processing error occurs
    • Schemas should define explicit ILLEGAL_N enumeration values for all integers, along with a pattern facet with a regular expression indicating that the string cannot begin with "ILLEGAL_" such as "[^(ILLEGAL)].*"
      • This works because within a simple-type definition enumerations and patterns are ANDed. One of the enumerations must be satisfied, AND one of the patterns (if there is more than one pattern).
      • This could be extended, if other enumeration values want to be considered well-formed but invalid. The regular expression can also exclude those E.g., UNDEFINED, UNUSED, or other marker enumeration values. 

Prior Features Removed

These dfdlx properties described in the prior proposal are removed. They are not in use.  

  • choiceBranchKeyKind
  • choiceDispatchKeyKind

These dfdlx functions were already removed from Daffodil prior to Daffodil 3.2.0 release.

  • repTypeValue
  • logicalTypeValue
  • outputTypeCalcNextSibling

Prior Features Retained but Deprecated

These features are in substantial use. However, in the future one would hope to deprecate these features and update the DFDL schemas that use them now to use the simplified approach. However, as these schemas are deployed in products/systems by Daffodil users, this change may take a long time. 

  • Properties
    • dfdlx:repValueRanges
    • choiceBranchKeyRanges 
  • dfdlx:inputTypeCalc(f: QName, x:A)
    • f must be a constant QName resolving to a simpleType with a transform defined
    • The type of x is determined statically at compile time as the primitive type of the repType of f.
    • The return type is given by the primitive type of the logical type of f.
    • If the types given by f do not match what is required, the relevent expression may be cast according to standard DFDL expression casting rules.
    • Returns the result of applying the inputTypeCalc function associated with f to x
  • dfdlx:outputTypeCalc(f: QName, x:Any)
    • f must be a constant QName resolving to a simpleType with a transform defined
    • The type of x is determined statically at compile time as the primitive type of the logical type of f.
    • The return type is given by the primitive type of the repType of f.
    • If the types given by f do not match what is required, the relevent expression may be cast according to standard DFDL expression casting rules.
    • Returns the result of applying the outputTypeCalc function associated with f to x
  • No labels