People using XML don't like its characteristics as a data language where every data item must be enclosed by matching starting and ending element tags.

However, DFDL does not support attributes.

This is an artifact of:

  • Attributes aren't strictly-speaking needed
  • Attribute declarations don't appear in a useful schema declaration order in XSD. They must be last after the model-group definition.

This note defines a way to add XSD attributes so that when a DFDL schema is used to parse/unparse data to/from XML, that XML can use XML attributes to store values of simple type.

The following is the key idea:

  • ignore the location of the attribute declarations in the XSD/DFDL schema.
  • express the physical location of the logical attribute using a DFDL construct that can be woven in between, before, or after element content.

Example

<element name="example">
 <complexType>
   <sequence>
     <element name="first" type="someComplexType"/>
     <sequence dfdlx:attribute="attr1"/> <!-- gives schema declaration location of attribute data -->
     <element name="second" type="someOtherType"/>
     <sequence dfdlx:attribute="attr2"/> <!-- this attribute is fourth in schema declaration order -->
   </sequence>
   <attribute name="attr1" type="xs:string" use="optional"/> <!-- optional == minOccurs=0, maxOccurs=1 -->
   <attribute name="attr2" type="xs:int" use="required"/> <!-- required == scalar -->
 </complexType>
</element>

<element name="exampleWithEmptyContent">
 <complexType> <!-- no sequences with dfdl:attribute property are needed -->
   <attribute name="attr1" type="xs:string" use="optional"/> <!-- optional == minOccurs=0, maxOccurs=1 -->
   <attribute name="attr2" type="xs:int" use="required"/> <!-- required == scalar -->
 </complexType>
</element>

Properties expressed on a sequence containing dfdlx:attribute would be combined with properties expressed on an XSD attribute declaration.

A dfdlx:attribute annotation element would be defined which can be added as an annotation element to an attribute declaration,  or attribute reference.

It would allow the same DFDL properties as for elements, with the exception of that since attributes can at most be optional, and can never be arrays, some properties like stopValue might not be allowed as dfdl:occursCountKind.

Attribute groups could also be supported via a dfdlx:attributeGroup property on sequences.

This preserves the fact that the DFDL schema is still an XML logical schema, while still giving the schema-declaration order a meaning for a mixture of attribute and element children. The schema declaration order would be the order that the elements/attributes appear in, in the model group of the complex type.

Some caveats:

  • The QName/NCName given for the dfdlx:attribute="QName" must match an attribute, attribute group, attribute ref, or attribute group ref of that element.
  • A sequence with dfdlx:attribute property cannot be a child of an XSD choice. It can only be the direct child of the sequence group that is the model group of a complex type.
  • DPath expressions need the "@" notation to name attributes.
  • DPath expressions would not allow indexing to retrieve attribute values e.g., ../@attr1[1] would be an SDE.
  • There is no need to allow simple content with attributes. Only empty content with attributes or element-only content with attributes need be supported.
  • Escaping - since attribute values are surrounded with quotation marks (single or double) in XML, those must be escaped if they appear in the content of the attribute value.

If an element has empty content and only attributes, then the need for sequences with the dfdl:attribute property is moot, and the attributes in their order of declaration will suffice to define the representation of the element type. So for small "tuple" like XML elements which have only attributes as children the schema remains simple and uncluttered.

An attribute can have dfdl:inputValueCalc, or dfdl:outputValueCalc, just like elements.

  • No labels

1 Comment

  1. An unordered sequence, containing subsequences with dfdl:attribute="..." on the subsequences, is a way to have an unordered representation in the physical data stream, and have the logical data model use attributes as the representation. There's no special additional construct needed to create an unordered representation that uses attributes to store some/all of the values from that representation.