Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This version has been superseded by this new proposal 

A prior proposal is here.

Introduction

Much data contains numeric values that are enumerations, with corresponding logical strings that provide the symbolic interpretation of them.

...

It has been suggested that the transformations expressed here between numeric and symbolic data would be useful in contexts outside of DFDL, and that this proposal could be formulated as an extension of XSLT or XQuery. This is certainly the case. What is presented first here is presented in the context of DFDL however.

Proposal

Each enum value for a string simple type can be annotated with properties that give the corresponding numeric value(s) either as a discrete list, or as a numeric range.

...

Code Block
<xs:element name="AltitudeSource" type="tns:AltitudeSourceType"/>

<xs:simpleType name="AltitudeSourceType">
<xs:restriction base="xs:string">
  <xs:enumeration value="Sensor"/>
  <xs:enumeration value="InstrumentRead"/>
  <xs:enumeration value="Estimated"/>
  <xs:enumeration value="Illegal"/>
</xs:restriction>
</xs:simpleType>

If we provide a function for use in dfdl:inputValueCalc named dfdl:lookupValue(key), this function would examine properties specified on the enumerations and perform the corresponding translation.

E.g.,Now consider this example which updates the above to provide numeric representation mappings for the symbolic values. Note that it uses more than one key in some cases, and uses numeric ranges for others:

Code Block
<xs:simpleType name="AltitudeSourceType"
  dfdl:repType="tns:AltitudeSourceIntType"> <!-- the rep type -->
  <xs:restriction base="xs:string"> <!-- the logical type -->
    <xs:enumeration value="Sensor" dfdl:lookupKey="1"/>
    <xs:enumeration value="InstrumentRead" dfdl:lookupKey="2"/>
    <xs:enumeration value="Estimated" dfdl:lookupKey="3 4 5 6 7"/>
    <xs:enumeration value="Illegal"
      dfdl:lookupRange"8 255 512 1023"
      dfdl:lookupKey="255"/>
    <xs:enumeration value="Reserved"
      dfdl:lookupRange"0 0 256 511"
      dfdl:lookupValue="511"/>
  </xs:restriction>
</xs:simpleType>

<xs:simpleType name="AltitudeSourceIntType"><!-- all properties for a simple type can go here -->
  <xs:restriction base="xs:int">
    <xs:minInclusive value="0"/>
    <xs:maxInclusive value="1023"/>
  </xs:restriction>
</xs:simpleType>

...

When both dfdl:lookupRange and dfdl:lookupKey are specified, then the dfdl:lookupRange is used when parsing, the they are combined to create the aggregate set of values and ranges for parsing. When unparsing the first dfdl:lookupKey is used when unparsing. The When unparsing, if no dfdl:lookupKey value must fall within an inclusive interval defined by the lookup rangeis specified, then the lowest value of the first specified dfdl:lookupRange is used.

Given these annotations, a DFDL processor can provide a logical string in the infoset, where the underlying representation is integer. Unparsing inverts the logical value back to a physical integer.

...

Note that the above can be implemented without use of advanced DFDL features like dfdl:inputValueCalc. The simple existence of an element with a dfdl:repType property would enable an implementation of this table-lookup capability without the need for a complete implementation of DFDL's expression language.

Recasting the Proposal for use Outside of DFDL Context

If the above were re-cast for use in XSLT or XQuery, or in Schematron assertions, etc. then the sensible thing would be to provide these same annotations on a XSD schema, along with an XSLT function that can be called, passing an element of the physical type to get out a value of the logical type, or vice versal. The function would be given the type name of the logical type. E.g., assume namespace prefix f, the functions might be

Code Block
f:lookupRep(logicalTypeQName, ...value of the rep type...) returns value of the logical type
f:lookupValue(logicalTypeQName, ... value of the logical type... ) returns value of the rep type

Multi-Dimensional and other Complex Lookups and Function

Note: This is a more advanced feature. Not initially part of the proposal, but here for initial feedback.

...

In this case, you get a XML Infoset that looks like

Code Block
 <dfdl:Table1>
  <Sensor><Boeing747>VeryHigh</Boeing747><Drone2>High</Drone2>...</Sensor>
  <InstrumentRead><Boeing747>High</Boeing747><Drone2>NS</Drone2>...</InstrumentRead>
  ...
</dfdl:table>

This table is referenced using a function call:

Code Block
dfdl:callFunction("tns:AltitudePrecisionTable", ../AltitudeSource, ../Platform)

The dfdl:inputValueCalc expression can be used to populate an element with this value.

To achieve the inverse lookup at unparse time requires a different table, or requires that the ../Platform and ../AltitudeSource elements are themselves members of the Infoset, so they need not be computed.

Now, the above works only if the param values are acceptable as the NCNames of elements. That would be common, but not universally true.

If not, then a more complex table and query is needed, and the non-NCName values must appear as the values of elements e.g.,

Code Block
<dfdl:KVTable>
  <Pair>
    <Key>Sensor</Key>
    <Value><Pair><Key>Boeing747</Key><Value>VeryHigh</Value></Pair><Pair>...
  </Pair>
  ...
</dfdl:table>

Then we want the xpath to be:

Code Block
Pair[Key eq $AltitudeSource]/Value/Pair[Key eq $Platform]/Value

to compute the value. This is potentially less performant, as it's not obvious that the lookup of Pair where the Key element has a specific value is going to be O(1) i.e., constant time.