Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagexml
<annotation><appinfo source="http://www.ogf.org/dfdl/">
 <dfdl:defineFormat name="base64">
      <dfdl:format ref="ex:general" layerTransform="base64_MIME" layerLengthKind="boundaryMark" layerLengthUnits="bytes"
        layerEncoding="iso-8859-1" />
 </dfdl:defineFormat>
 <dfdl:defineFormat name="folded">
      <dfdl:format ref="ex:general" layerTransform="lineFolded_IMF" layerLengthKind="implicit" layerLengthUnits="bytes"
        layerEncoding="iso-8859-1" />
 </dfdl:defineFormat>
</appinfo></annnotation>

    <xs:element name="root" dfdl:lengthKind="implicit">
      <xs:complexType>
        <xs:sequence dfdl:ref="folded"> <!-- From here, everything is line-folded -->
          <xs:sequence>
            <xs:element name="marker" type="xs:string"
              dfdl:initiator="boundary=" dfdl:terminator="%CR;%LF;" />
            <xs:element name="contents" dfdl:lengthKind="implicit" 
              dfdl:initiator="{ fn:concat('--', ../marker, '%CR;%LF;') }">
              <xs:complexType>
                <xs:sequence>
                  <xs:element name="comment" type="xs:string" 
                    dfdl:initiator="Comment:%SP;" dfdl:terminator="%CR;%LF;" />
                  <xs:element name="contentTransferEncoding"  type="xs:string"
                    dfdl:initiator="Content-Transfer-Encoding:%SP;"
                    dfdl:terminator="%CR;%LF;" />
                  <xs:element name="body" dfdl:lengthKind="implicit" dfdl:initiator="%CR;%LF;">
                    <xs:complexType>
                      <xs:choice dfdl:choiceDispatchKey="{ ../contentTransferEncoding }">
                        <xs:sequence dfdl:choiceBranchKey="base64">
                          <xs:sequence dfdl:ref="tns:base64"
                            dfdl:layerBoundaryMark="{ 
                              fn:concat(dfdl:decodeDFDLEntities('%CR;%LF;'),'--', ../../marker, '--')
                             }"> <!-- base64_MIME encoding for this sequence -->
                            <xs:element name="value" type="xs:string" />
                          </xs:sequence> <!-- END base64_MIME encoding --> 
                        </xs:sequence>
                        <!--
                           This is where other choice branches than base64 would go. 
                         -->
                      </xs:choice>
                    </xs:complexType>
                  </xs:element> <!-- END element body --> 
                </xs:sequence>
              </xs:complexType>
            </xs:element> <!-- END element contents -->
          </xs:sequence>
        </xs:sequence> <!-- END line folding -->
      </xs:complexType>
    </xs:element>

VCalendar Example

Consider this VCALENDAR DataThe data corresponding to the above schema is shown here:

Code Block
languagetext
BEGIN:VCALENDAR
PRODID:
VERSION:1.0
BEGIN:VEVENT
DTSTART:20170903T170000Z
DTEND:20170903T173000Z
LOCATION:test location
UID:040000008200E00074C5B7101A82E0080000000010156B50B224D301000000000000000
    01000000083A43200A4E43F4E800BE12703B99BF0
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:=
 Text that will require line folding: Lorem ipsum dolor sit amet, consecte=
 tur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore=
 magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco=
 laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor i=
 n reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla par=
 iatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui =
 officia deserunt mollit anim id est laborum.=0D=0A=0D=0A =0D=0A=0D=0A=0D==
 =0A
SUMMARY:test subject
PRIORITY:3
END:VEVENT
END:VCALENDAR

...

boundary=frontier%CR;
--frontier%CR;
Comment: This simulates a header field that is so long it will get folded%CR;
 into multiple lines of text because it is too long and my job is at the%CR;
 redundancy department is where I work.%CR;
Content-Transfer-Encoding: base64%CR;
%CR;
TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdCwg%CR;
c2VkIGRvIGVpdXNtb2QgdGVtcG9yIGluY2lkaWR1bnQgdXQgbGFib3JlIGV0IGRvbG9yZSBtYWdu%CR;
YSBhbGlxdWEuIFV0IGVuaW0gYWQ=%CR;
--frontier--

The above data uses "%CR;" a DFDL Character Entity, to indicate a literal carriage-return or CR character which is U+0a. When looking at this data, keep in mind that %CR; looks like 4 characters, but is actually only 1.

In the data notice the line initiated by "Comment:". That line has been folded by inserting CRLF before a space, twice to insure no line is longer than 78 characters.

The above data parses to this DFDL infoset - presented as XML. (Apologies for the long lines, but when illustrating line wrapping/folding, they're inevitable.)

Code Block
<ex:root>
  <marker>frontier</marker>
  <contents>
    <comment><![CDATA[This simulates a header field that is so long it will get folded into multiple lines of text because it is too long and my job is at the redundancy department is where I work.]]></comment>
    <contentTransferEncoding>base64</contentTransferEncoding>
    <body>
      <value><![CDATA[Lorem ipsum dolor sit 
Code Block
languagexml
<VCalendar>
  <ProdID>-//Microsoft Corporation//Outlook 15.0 MIMEDIR//EN</ProdID>
  <Version>1.0</Version>
  <VEvent>
    <DTStart></DTStart>
    <DTEnd></DTEnd>
    <Location>test location</Location>
   
 
<UID>040000008200E00074C5B7101A82E0080000000010156B50B224D30100000000000000001000000083A43200A4E43F4E800BE12703B99BF0</UID>
    <Description>
      <Encoding>QUOTED-PRINTABLE</ENCODING>
      <QP/>
     
 <Value>Text that will require line folding: Lorem ipsum dolor sit
amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut 
labore et dolore magna aliqua. Ut enim ad]]></value>
 minim veniam, quis nostrud </body>
exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. 
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum 
dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non 
proident, sunt in culpa qui officia deserunt mollit anim id est 
laborum.&#xEOOD;
&#xEOOD;
 &#xEOOD;
&#xEOOD;
&#xEOOD;
</Value>
    </Description>
    <Summary>test subject</Summary>
    <Priority>3</Priority>
  </VEvent>
</VCalendar>

...

 </contents>
</ex:root>

In the above the base64 has been decoded into a long string of "Lorem ipsum" nonsense, and the line-folded comment has been unfolded. This data can be unparsed with the same DFDL schema to get back the data representation shown previously. That is to say this data "round trips" through parsing and unparsing.

Example of Multi-layer Transformation

Here's some CSV data

Code Block
languagetext
last,first,middle,DOB
smith,robert,brandon,1988-03-24
johnson,john,henry,1986-01-23
jones,arya,cat,1986-02-19

Here's that data gzipped, which takes 115 byte, pre-pended by a 4-byte integer containing that 115 value (storing the length), then the whole thing base64 encoded:

We'll pre-pend that with a 4-byte binary integer holding the length of 168 which is 4 more bytes: 0000 00A8, then base64 encode all of it:

Code Block
languagetext
AAAAcx+LCAAAAAAAAAAtyUEKgCAQheG94E1mIDWittG+M0xpaNQIo5tuX0Kb98P7LioVjiTf3sn7
K8CyzlqVO9UIkrcgFTYh9pnBTOOInUPba3XmyOX7WiEGlqfxgJ1B6xpzKEDyEOxUf7JoJq1e/RI4
wXIAAAA=

The schema that describes the CSV data without the stream transforms is this:

Code Block
languagexml
  <xs:element name="file" type="ex:fileType" />

  <xs:complexType name="fileType">
    <xs:sequence>
      <xs:element name="data" dfdl:lengthKind="implicit">
        <xs:complexType>
          <xs:sequence>
            <xs:group ref="ex:fileTypeGroup" />
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>

    <xs:group name="fileTypeGroup">
  
Code Block
languagexml
<xs:schema ....>

 <dfdl:format separatorPosition="infix" lengthKind="boundaryMark" encoding="utf-8"
  occursCountKind="parsed" separator="" sequenceKind="ordered"
  separatorPosition="infix"/>

 <dfdl:defineFormat name="folded">
  <dfdl:format layerTransform="foldedLines" layerLengthKind="boundaryMark" layerEncoding="us-ascii"/>
  <!-- boundaryMark here means to enclosing end-of-data, as no boundary mark delimiter is defined. -->
</dfdl:defineFormat>

<dfdl:defineFormat name="qp">
  <dfdl:format layerTransform="quotedPrintable" layerLengthKind="pattern"
     layerLengthPattern="[^\n]*?(?=(?<!=)\n)"/>
 
 <!-- QPs are terminated by a newline that is not preceded by an =. 
      This final newline is not consumed as part of the content. -->
  
 <!-- Alternatively, the QP transform itself can determine the length 
      by searching for this final newline (but leaving it there).
      In which case the lengthKind would be "implicit" -->
</dfdl:defineFormat>

 <xs:element name="VCalendar" dfdl:initiator="BEGIN:VCALENDAR%NL;" dfdl:terminator="END:VCALENDAR%NL; END:VCALENDAR">
  <xs:complexType>
    <xs:sequence dfdl:separator="%NL;" dfdl:sequenceKindseparatorPosition="unorderedpostfix">
        <xs:sequenceelement name="header" minOccurs="0" maxOccurs="1" dfdl:refoccursCountKind="tns:foldedimplicit">
         <xs:element name="ProdID" type="xs:string" dfdl:initiator="PRODID:" minOccurs="0"/  <xs:complexType>
            <xs:sequence dfdl:separator=",">
      </xs:sequence>
         <xs<xs:element name="Versiontitle" type="xs:string" dfdl:initiatormaxOccurs="VERSION:unbounded" minOccurs="0" />
      <xs:element name="VEvent" maxOccurs="unbounded" minOccurs="0" dfdl:occursCountKind="parsed"      </xs:sequence>
        dfdl:initiator="BEGIN:VEVENT%NL;" dfdl:terminator="END:VEVENT">
  </xs:complexType>
        </xs:element>
        <xs:element name="record" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence dfdl:separator="%NL;" dfdl:sequenceKind="unordered,">
              <xs:element name="DTStartitem" type="xs:string" maxOccurs="unbounded" dfdl:initiatoroccursCount="DTSTART:" />
{ fn:count(../../header/title) }"
                <xsdfdl:element nameoccursCountKind="DTEndexpression" type="xs:string" dfdl:initiator="DTEND:" //>
            <!-- /xs:sequence>
              content from here could have long lines, so must be folded </xs:complexType>
            --></xs:element>
            <xs:sequence dfdl:ref="tns:folded"></xs:sequence>
          </xs:group>

We can annotate this schema with additional stream transform information to enable it to describe the base64 encoded, compressed data.

One easy way to do this is by modifying the complex type definition for fileType to this:

Code Block
languagexml
    <xs:elementcomplexType name="LocationfileType" type="xs:string" dfdl:initiator="LOCATION:" minOccurs="0"/>>
      <!--
           first we have <xs:element name="UID" type="xs:string" dfdl:initiator="UID:" minOccurs="0"/>the base64 details
       -->
       <xs:element name="Description" sequence dfdl:initiatorref="DESCRIPTIONex:base64" minOccursdfdl:layerBoundaryMark="0--END--">
        <xs:sequence>
          <xs:complexType>
<!--
              now the gzip details, including  <xs:sequence>      the 4-byte gzLength element that stores how long
        
      the gzipped data is.
          <xs:element name="Encoding" type="xs:string"  -->
          <xs:element name="gzLength" type="xs:int"       dfdl:representation="binary" dfdl:lengthKind="implicit"
            dfdl:initiatoroutputValueCalc="ENCODING="{ dfdl:terminator=":" minOccurs="0"contentLength( ../data, 'bytes') }" />
          <!--
           <xs:choice dfdl:choiceDispatchKey="{ if (fn:exists(./Encoding)) then ./Encoding else '' }">
               this 'data' element is needed only because we have to measure how big it is when unparsing.
         <!-- 
   If we were only worried about parsing, we woundn't need to have this extra 'data' element       we inspect the value of the Encoding element and decide what branch of the choicewrapped around
             the contents.
           -->
              based on it 
<xs:element name="data" dfdl:lengthKind="implicit">
            <xs:complexType>
               <!-->
                 now the gzipped layered sequence itself
  <xs:sequence dfdl:choiceBranchKey="QUOTED-PRINTABLE">
             -->
              <xs:sequence dfdl:separatorref="ex:gzip" dfdl:sequenceKindlayerLength="unordered{ ../gzLength }">
                <!--
         <!--
         finally, inside that, we have the original fileTypeGroup group reference.
        Each branch starts with a distinct dummy element to satisfy the UPA rules of XML Schema -->
                <xs:group         --ref="ex:fileTypeGroup" />
              </xs:sequence>
           <xs:element name="QP" type="xs:string" dfdl:inputValueCalc="{ '' }" /> </xs:complexType>
          </xs:element>
         </xs:sequence>
      <!--/xs:sequence>
     </xs:complexType>

Along with that we need the definitions of these named stream formats and default format:

Code Block
languagexml
    <dfdl:defineFormat name="general">
      <dfdl:format ref="ex:GeneralFormat" lengthKind="delimited" outputNewLine="%CR;%LF;" layerEncoding="iso-8859-1"
      Here notice tha tthe layerRef for the qp data is scoped to just this inner element.layerLengthUnits='bytes' />
    </dfdl:defineFormat>

    <dfdl:defineFormat name="base64">
      <dfdl:format ref="ex:general" layerTransform="base64_MIME" layerLengthKind="boundaryMark" />
    </dfdl:defineFormat>

           --<dfdl:defineFormat name="gzip">
      <dfdl:format ref="ex:general" layerTransform="gzip" layerLengthKind="explicit" />
    </dfdl:defineFormat>

    <dfdl:format ref="ex:general" />

the this schema parses this data, undoing both layers to obtain the expected infoset of:

Code Block
       <xs:sequence dfdl:ref="tns:qp">
        <ex:file>
          <gzLength>115</gzLength>
         <xs:element name="Value" type="xs:string"/>
 <data>
            <header>
               </xs:sequence><!-- end layer quoted printable -->
<title>last</title>
              <title>first</title>
              <<title>middle</xs:sequence>title>
               <title>DOB</title>
        <!-- 
     </header>
            <record>
              <item>smith</item>
  repeat the above pattern for the choice branches for the various encodings <item>robert</item>
              <item>brandon</item>
              <item>1988-03->24</item>
            </record>
        </xs:choice>
    <record>
              <<item>johnson</xs:sequence>item>
                </xs:complexType><item>john</item>
              </xs:element><item>henry</item>
           
   <item>1986-01-23</item>
           <xs:element name="Summary" type="xs:string"  dfdl:initiator="SUMMARY:" minOccurs="0"/>
 </record>
            <record>
         <xs:element name="Priority" type="xs:string" dfdl:initiator="PRIORITY:" minOccurs="0" <item>jones</>item>
              <<item>arya</xs:sequence>
item>
              <<item>cat</xs:complexType>item>
        </xs:element>
      </xs:sequence><!-- end folded layer -->
    </xs:sequence>
<item>1986-02-19</item>
            </xs:complexType>
</xs:element>
</xs:schema>

Example of Multi-layer Transformation

Here's some CSV data

Code Block
languagetext
last,first,middle,DOB
smith,robert,brandon,1988-03-24
johnson,john,henry,1986-01-23
jones,arya,cat,1986-02-19

Here's that data gzipped, then base64 encoded.

Code Block
languagetext
H4sICBqITloAA3NpbXBsZUNTVi5jc3YALclBCoAgEIXhvWeZgbSI3Eb7zjCmoWEjjG66fQZt3g/v
y1QbnEn63sn7HGDbV1Xv1CJIcUEaOCH2hUHbZcFhRDOpq0Su/foKMbA8n844aDRjVw4VSB6Cg9ov
BrVVL2G135RuAAAA

The schema that describes the CSV data without the stream transforms is this:

...

languagexml

...

record>
          </data>
        </ex:file>

This schema will round-trip parse then unparse, then parse again, the data.

Summary

  • allows stacking transforms one on top of another. So you can have base64 encoded compressed data as the payload representation of
    a child element within a larger element.

  • allows specifying properties of the underlying data layers separately from the properties of the logical data.

  • scopes the transforms over a xs:sequence body only.

  • Avoids new annotation elements with particulars about scoping.
  • Simple: doesn't add new functions for layering use when existing dfdl:contentLength will already handle it.
  • Complex cases - e.g., initiator before layered data, are handled by encapsulating the layered sequence in another sequence or element that carries the initiator.
  • Layer annotations are only about the determining of the length of the layered region, and the algorithm for transforming the data.
  • Layer transforms have mandatory layer alignment (1 byte for now)

Open design issues

  • Parameterization of transform algorithms - many algorithms will have variations that can be controlled by parameters; however, whether there needs to be a parameterization method, or there can just be a large number of individual transforms each having specific configurations of parameters... it is unclear what is truly required and experience with these concepts will be needed before there will be enough information to proposed ideas here.
  • Debug and trace impact, and how to provide visibility to what is going on when an error occurs in the middle of parsing/unparsing when transforms are in use. E.g., the bit/byte position where a run time parse error occurs would be in some transformed stream, not the underlying stream. I suspect some experience with these transform concepts will be needed before there will be enough information to propose ideas here.


Below is For the Future, once Quoted Printable has been implemented.

VCalendar Example Using Quoted-Printable

Consider this VCALENDAR Data:

Code Block
languagetext
BEGIN:VCALENDAR
PRODID:
VERSION:1.0
BEGIN:VEVENT
DTSTART:20170903T170000Z
DTEND:20170903T173000Z
LOCATION:test location
UID:040000008200E00074C5B7101A82E0080000000010156B50B224D301000000000000000
    01000000083A43200A4E43F4E800BE12703B99BF0
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:=
 Text that will require line folding: Lorem ipsum dolor sit amet, consecte=
 tur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore=
 magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco=
 laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor i=
 n reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla par=
 iatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui =
 officia deserunt mollit anim id est laborum.=0D=0A=0D=0A =0D=0A=0D=0A=0D==
 =0A
SUMMARY:test subject
PRIORITY:3
END:VEVENT
END:VCALENDAR

We want to create a schema that describes this.

In the above there are two behaviors that require use of stream transforms. First is the UID. This has been broken to a maximum line length of 76 characters by way of the folded-lines transformation.

The second is the DESCRIPTION which uses a transformation called QUOTED-PRINTABLE which both achieves short line lengths, and also enables embedding of CR, LF, and other characters at the ends of lines.

The result is that we want this XML Infoset:

Code Block
languagexml
<VCalendar>
  <ProdID>-//Microsoft Corporation//Outlook 15.0 MIMEDIR//EN</ProdID>
  <Version>1.0</Version>
  <VEvent>
    <DTStart></DTStart>
    <DTEnd></DTEnd>
    <Location>test location</Location>
   
 
<UID>040000008200E00074C5B7101A82E0080000000010156B50B224D30100000000000000001000000083A43200A4E43F4E800BE12703B99BF0</UID>
    <Description>
      <Encoding>QUOTED-PRINTABLE</ENCODING>
      <QP/>
     
 <Value>Text that will require line folding: Lorem ipsum dolor sit
amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut 
labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud 
exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. 
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum 
dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non 
proident, sunt in culpa qui officia deserunt mollit anim id est 
laborum.&#xEOOD;
&#xEOOD;
 &#xEOOD;
&#xEOOD;
&#xEOOD;
</Value>
    </Description>
    <Summary>test subject</Summary>
    <Priority>3</Priority>
  </VEvent>
</VCalendar>

Notice the CRLFs at the end. The CRs are represented as remapped to Private-Use-Area(PUA) E00D entities.

The DFDL schema for this, including the specification of the layering transform behaviors.

 

 


<xs:schema ....>  <dfdl:format separatorPosition="infix" lengthKind="boundaryMark" encoding="utf-8" occursCountKind="parsed" separator="" sequenceKind="ordered" separatorPosition="infix"/>  <dfdl:defineFormat name="folded"> <dfdl:format layerTransform="foldedLines" layerLengthKind="boundaryMark" layerEncoding="us-ascii"/> <!-- boundaryMark here means to enclosing end-of-data, as no boundary mark delimiter is defined. --> </dfdl:defineFormat> <dfdl:defineFormat name="qp"> <dfdl:format layerTransform="quotedPrintable" layerLengthKind="pattern" layerLengthPattern="[^\n]*?(?=(?<!=)\n)"/> <!-- QPs are terminated by a newline that is not preceded by an =. This final newline is not consumed as part of the content. --> <!-- Alternatively, the QP transform itself can determine the length by searching for this final newline (but leaving it there). In which case the lengthKind would be "implicit" --> </dfdl:defineFormat>  <xs:element name="VCalendar" dfdl:initiator="BEGIN:VCALENDAR%NL;" dfdl:terminator="END:VCALENDAR%NL; END:VCALENDAR"> <xs:complexType> <xs:sequence dfdl:separator="%NL;" dfdl:sequenceKind="unordered"> <xs:sequence dfdl:ref="tns:folded">  <xs:element name="ProdID" type="xs:string" dfdl:initiator="PRODID:" minOccurs="0"/> </xs:sequence>  <xs:element name="Version" type="xs:string" dfdl:initiator="VERSION:" minOccurs="0" /> <xs:element name="VEvent" maxOccurs="unbounded" minOccurs="0" dfdl:occursCountKind="parsed" dfdl:initiator="BEGIN:VEVENT%NL;" dfdl:terminator="END:VEVENT"> <xs:complexType> <xs:sequence dfdl:separator="%NL;" dfdl:sequenceKind="unordered"> <xs:element name="DTStart" type="xs:string" dfdl:initiator="DTSTART:" /> <xs:element name="DTEnd" type="xs:string" dfdl:initiator="DTEND:" /> <!-- content from here could have long lines, so must be folded --> <xs:sequence dfdl:ref="tns:folded"> <xs:element name="Location" type="xs:string" dfdl:initiator="LOCATION:" minOccurs="0"/> <xs:element name="UID" type="xs:string" dfdl:initiator="UID:" minOccurs="0"/>   <xs:element name="Description" dfdl:initiator="DESCRIPTION:" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="Encoding" type="xs:string" dfdl:initiator="ENCODING=" dfdl:terminator=":" minOccurs="0" /> <xs:choice dfdl:choiceDispatchKey="{ if (fn:exists(./Encoding)) then ./Encoding else '' }"> <!-- we inspect the value of the Encoding element and decide what branch of the choice based on it --> <xs:sequence dfdl:choiceBranchKey="QUOTED-PRINTABLE"> dfdl:separator="" dfdl:sequenceKind="unordered"> <!-- Each branch starts with a distinct dummy element to satisfy the UPA rules of XML Schema --> <xs:element name="QP" type="xs:string" dfdl:inputValueCalc="{ '' }" /> <!-- Here notice tha tthe layerRef for the qp data is scoped to just this inner element.

...

-->
            

...

             <xs:sequence dfdl:

...

ref="

...

tns:qp">
                 

...

          <xs:element name="

...

Value" type="xs:string

...

"

...

/>
            

...

            

...

 </xs:

...

sequence><!-- end layer quoted printable -->
   

...

        

...

 

...

 

...

          

...

</xs:

...

sequence>
      

...

              

...

 

...

  <!-- 
                   

...

 

...

 

...

     repeat the above pattern for the choice branches for the 

...

various encodings 
           

...

          

...

   -->
     

...

      

...

     

...

languagexml

...

 

...

  

...

 </xs:choice>
    

...

         

...

 

...

    </xs:sequence>
                </xs:

...

complexType>

...

Along with that we need the definitions of these named stream formats:

...

languagexml

...


...

  

...

 

...

 

...

 

...

       

...

 

...

 

...

</

...

Message Box
titleImplementation Note
typegeneric

The implementation is a bunch of front end property and annotation stuff, but ultimately the runtime is a combinator for parse which at its start puts in place the stream transform, and uses this transformed stream for all the "body" and sub parse/unparse activity. At the exit of this combinator the stream transform is removed and shutdown/error-checking occurs (for left-over data among other things). Presumably these combinators interact only with the I/O layer, not the infoset. These combinators would "stack" naturally providing the right behavior when multiple transforms are layered on top of each other.

For unparsing, it is not so simple, due to the need for suspensions to support dfdl:outputValueCalc - unparsing doesn't work via the basic combinator model we see when parsing.

Summary

  • allows stacking transforms one on top of another. So you can have base64 encoded compressed data as the payload representation of
    a child element within a larger element.

  • allows specifying properties of the underlying data layers separately from the properties of the logical data.

  • scopes the transforms over a xs:sequence body only.

  • Avoids new annotation elements with particulars about scoping.
  • Simple: doesn't add new functions for layering use when existing dfdl:contentLength will already handle it.
  • Complex cases - e.g., initiator before layered data, are handled by encapsulating the layered sequence in another sequence or element that carries the initiator.
  • Layer annotations are only about the determining of the length of the layered region, and the algorithm for transforming the data.
  • Layer transforms have mandatory layer alignment (1 byte for now)

Open design issues

...

xs:element>           
              <xs:element name="Summary" type="xs:string"  dfdl:initiator="SUMMARY:" minOccurs="0"/>
              <xs:element name="Priority" type="xs:string" dfdl:initiator="PRIORITY:" minOccurs="0" />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence><!-- end folded layer -->
    </xs:sequence>
  </xs:complexType>
</xs:element>
</xs:schema>

 

...