Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction: The Value Element Problem

Adding attribute support to dfdl (as proposed here: Proposal: Extend DFDL with XML Attribute Support) will be a step in the right direction, but does not solve the general problem of starting from a logical XML schema that you like, and then matching it up against data with properties.

The challenge here is that a dfdl schema has to reflect the actual structure of the physical data so as to provide a place to annotate which captures the complexity of that representation. 

Really you have two problems here:

  1. describe the actual representation of the data, and then
  2. describe the mapping of it to the XML form you actually like.

DFDL is only about the first of these two problems. If the mapping needed for (2) is complicated then this separation of concerns is natural, and some data-transformation tool to carry out the mapping seems appropriate.

The challenge comes when the mapping in (2) is very simple. Often the DFDL schema can be close to what is desired, but not close enough that the differences are not an annoyance. Crafting the mapping in (2) is tedious if the vast majority of it is repetitive simplistic changes. 

Here is an example. Many formats use smart string representations. To a logical user of XML this data is an element of simple type, xs:string. The actual representation

...

requires a sequence:

Code Block
   <xs:complexType name="tString50Type">
    <xs:sequence>
      <xs:element name="value" type="xs:string" dfdl:lengthKind="pattern"
        dfdl:lengthPattern="[^~]{0,49}(?=~)|.{50}" />
      <xs:sequence
        dfdl:terminator="{ if (fn:string-length(./value) eq 50) then '%ES;' else '~' }" />
    </xs:sequence>
  </xs:complexType>

This is a string with maximum length 50, which has a terminating delimiter only if the length is less than 50. If the length is exactly 50, then no delimiter is present. 

This structure requires a complex type to represent. Even though the logical data is "just a string".

This need for a complex representation type for a simple logical type is quite pervasive and results in the "value element" problem where a logical user of the XML sees not

Code Block
<elem>string</elem>

,but rather

Code Block
<elem><value>string</value></elem>

The seemingly simple logical string type requires a complex type be used to describe all the complexity of its representation; hence, the best we can do is to have one more tier of elements inside that complex type, one of which carries the simple value we seek.

This problem will prevent use of XML attributes as the natural logical XML for any simple type value that has a complex representation. At best you would get 

Code Block
<elem value="string"/>

which is still unnatural and inefficient.

Solution: Properties for Complex Representation for Simple Types - The dfdlx:repType Property

A mechanism where a simple type element (or attribute someday) can get its value from a hidden complex structure, is needed. 

A dfdlx (extension) property dfdlx:repType or dfdl:repDef was proposed long ago to enable simple types to have a complex type as their representation. This proposal was not included in DFDL version 1.0 due to inexperience with DFDL generally, and with the powerful dfdl:inputValueCalc, dfdl:outputValueCalc, and hidden groups capabilities. Now that there is some experience with these aspects of DFDL, it is clear that the need for dfdlx:repType remains to enable a logical simple type to have a complex type representation.

The value element within that complex type would provide the corresponding value used for the logical simple type during parsing and unparsing.

This is very much like putting a dfdl hidden group next to the element of simple type, but buries the complexity inside the simple type definition where it does not clutter the logical data model.

Note that there is already a dfdlx:repType property used to enable numeric simple type values to be converted to string values from an enum type. (See: Proposal: (Superceded) Feature to support enumerations and typeValueCalc)

This usage would not conflict with that, because in this case the representation type is a complex type. 

Consdier the following example. This

...

reuses the smart string from the section above. 

This first example block shows how this must be captured in DFDL v1.0 without extensions.

Code Block
<xs:element name="myString">
  <xs:complexType>
    <xs:group ref type="tns:tString50GrouptString50Type" />

  </xs:complexType>
</xs:element>

  <xs:groupcomplexType name="tString50GrouptString50Type">
    <xs:sequence>
      <xs:element name="value" type="xs:string" dfdl:lengthKind="pattern"
        dfdl:lengthPattern="[^~]{0,49}(?=~)|.{50}" />
      <xs:sequence
        dfdl:terminator="{ if (fn:string-length(./value) eq 50) then '%ES;' else '~' }" />
    </xs:sequence>
  </xs:group>complexType>

Given data of "theString~", this results in XML instances that look like:

Code Block
<myString><value>theString</value></myString>

The proposal here is to recast this using a dfdlx:repType extension as follows:

Code Block
<xs:element name="myString" type="tns:tString50"/>

  <xs:simpleType name="tString50" dfdl:repType="tns:tString50RepType">
    <xs:restriction base="xs:string"/>
  </xs:simpleType>

  <xs:complexType name="tString50RepType">
    <xs:group ref dfdlx:repType="tns:tString50GrouptString50Type"/> <!-- tString50Group as above -->
  </xs:complexType>

This reuses the definition of the tString50Group from above, and shows that this aspect, which describes the actual represenation of this string data, is not changed from the way this must be captured in DFDL v1.0.

The extension property dfdlx:repType simply allows us to hide the complexity better,  The resulting XML instances would look like:

Code Block
<myString>theString</myString>

Stylistically, given the dfdlx:repType property, one might choose to declutter the schema like the schema below, which eliminates the indirections through a simpleType definition, and a group definition. 

Code Block
<xs:element name="myString" type="xs:string" dfdlx:repType="tns:tString50"/>


  <xs:complexType name="tString50tString50Type">
     <xs:sequence>
      <xs:element name="value" type="xs:string" dfdl:lengthKind="pattern"
        dfdl:lengthPattern="[^~]{0,49}(?=~)|.{50}" />
      <xs:sequence
        dfdl:terminator="{ if (fn:string-length(./value) eq 50) then '%ES;' else '~' }" />
    </xs:sequence>   
  </xs:complexType>

...

This reuses the definition of the tString50Type from above, and shows that this aspect, which describes the actual representation of this string data, is not changed from the way this must be captured in DFDL v1.0.

The child element named 'value' is special. That name is a keyword that identifies that child element of simple type as providing the value for the logical element's simple type.

The extension property dfdlx:repType simply allows us to hide the complexity better, 

The resulting XML instances would look like:

Code Block
<myString>theString</myString>