Introduction: The Value Element Problem

Adding attribute support to dfdl (as proposed here: Proposal: Extend DFDL with XML Attribute Support) will be a step in the right direction, but does not solve the general problem of starting from a logical XML schema that you like, and then matching it up against data with properties.

The challenge here is that a dfdl schema has to reflect the actual structure of the physical data so as to provide a place to annotate which captures the complexity of that representation. 

Really you have two problems here:

  1. describe the actual representation of the data, and then
  2. describe the mapping of it to the XML form you actually like.

DFDL is only about the first of these two problems. If the mapping needed for (2) is complicated then this separation of concerns is natural, and some data-transformation tool to carry out the mapping seems appropriate.

The challenge comes when the mapping in (2) is very simple. Often the DFDL schema can be close to what is desired, but not close enough that the differences are not an annoyance. Crafting the mapping in (2) is tedious if the vast majority of it is repetitive simplistic changes. 

Here is an example. Many formats use smart string representations. To a logical user of XML this data is an element of simple type, xs:string. The actual representation requires a sequence:

   <xs:complexType name="tString50Type">
    <xs:sequence>
      <xs:element name="value" type="xs:string" dfdl:lengthKind="pattern"
        dfdl:lengthPattern="[^~]{0,49}(?=~)|.{50}" />
      <xs:sequence
        dfdl:terminator="{ if (fn:string-length(./value) eq 50) then '%ES;' else '~' }" />
    </xs:sequence>
  </xs:complexType>

This is a string with maximum length 50, which has a terminating delimiter only if the length is less than 50. If the length is exactly 50, then no delimiter is present. 

This structure requires a complex type to represent. Even though the logical data is "just a string".

This need for a complex representation type for a simple logical type is quite pervasive and results in the "value element" problem where a logical user of the XML sees not

<elem>string</elem>

,but rather

<elem><value>string</value></elem>

The seemingly simple logical string type requires a complex type be used to describe all the complexity of its representation; hence, the best we can do is to have one more tier of elements inside that complex type, one of which carries the simple value we seek.

This problem will prevent use of XML attributes as the natural logical XML for any simple type value that has a complex representation. At best you would get 

<elem value="string"/>

which is still unnatural and inefficient.

Solution: Properties for Complex Representation for Simple Types - The dfdlx:repType Property

A mechanism where a simple type element (or attribute someday) can get its value from a hidden complex structure, is needed. 

A dfdlx (extension) property dfdlx:repType or dfdl:repDef was proposed long ago to enable simple types to have a complex type as their representation. This proposal was not included in DFDL version 1.0 due to inexperience with DFDL generally, and with the powerful dfdl:inputValueCalc, dfdl:outputValueCalc, and hidden groups capabilities. Now that there is some experience with these aspects of DFDL, it is clear that the need for dfdlx:repType remains to enable a logical simple type to have a complex type representation.

The value element within that complex type would provide the corresponding value used for the logical simple type during parsing and unparsing.

This is very much like putting a dfdl hidden group next to the element of simple type, but buries the complexity inside the simple type definition where it does not clutter the logical data model.

Note that there is already a dfdlx:repType property used to enable numeric simple type values to be converted to string values from an enum type. (See: Proposal: (Superceded) Feature to support enumerations and typeValueCalc)

This usage would not conflict with that, because in this case the representation type is a complex type. 

Consdier the following example. This reuses the smart string from the section above. 

This first example block shows how this must be captured in DFDL v1.0 without extensions.

<xs:element name="myString" type="tns:tString50Type"/>

   <xs:complexType name="tString50Type">
    <xs:sequence>
      <xs:element name="value" type="xs:string" dfdl:lengthKind="pattern"
        dfdl:lengthPattern="[^~]{0,49}(?=~)|.{50}" />
      <xs:sequence
        dfdl:terminator="{ if (fn:string-length(./value) eq 50) then '%ES;' else '~' }" />
    </xs:sequence>
  </xs:complexType>

Given data of "theString~", this results in XML instances that look like:

<myString><value>theString</value></myString>

The proposal here is to recast this using a dfdlx:repType extension as follows:

<xs:element name="myString" type="xs:string" dfdlx:repType="tns:tString50Type"/>
   
<xs:complexType name="tString50Type">
    <xs:sequence>
      <xs:element name="value" type="xs:string" dfdl:lengthKind="pattern"
        dfdl:lengthPattern="[^~]{0,49}(?=~)|.{50}" />
      <xs:sequence
        dfdl:terminator="{ if (fn:string-length(./value) eq 50) then '%ES;' else '~' }" />
    </xs:sequence>
  </xs:complexType>

This reuses the definition of the tString50Type from above, and shows that this aspect, which describes the actual representation of this string data, is not changed from the way this must be captured in DFDL v1.0.

The child element named 'value' is special. That name is a keyword that identifies that child element of simple type as providing the value for the logical element's simple type.

The extension property dfdlx:repType simply allows us to hide the complexity better, 

The resulting XML instances would look like:

<myString>theString</myString>
  • No labels