Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

    • tunable name: xmlOutputStyle
    • values is a whitespace separated list of tokens drawn from this set.  
      • "default" (Current behavior - ok if data is not being pretty printed, or will not be re-read in, or if whitespace is fungible in the actual data format), 
      • "prettyPrintSafe" - preserves the XML Infoset exactly including whitespace characters. This XML can be pretty printed without indentation changes modifying element values. 
      • other values are reserved for future use.

Assumptions & Limitations

We assume pretty printers must obey only a small set of constraints on how they inject whitespace for indenting, or line breaking:

...

It follows from that, if all significant whitespace is within CDATA regions, the data can be pretty printed and the significant whitespace is unaffected.

For example: this reformatting is not allowed. These are not equivalent.

Code Block
<foo><![CDATA[some stuff]]></foo>

<!-- reformatted to --> 

<foo>
  <![CDATA[some stuff]]>
</foo>

Algorithm

  • assumes text is all XML-legal characters
    • so remapping of things like NUL -> E000 and Ctrl-A -> E001 is already done.
    • see: https://daffodil.apache.org/infoset/ section "XML Illegal Characters"
    • see also: Daffodil source code object XMLUtils.remapXMLIllegalCharToPUA and other methods that invert this conversion.
  • assumes we know what is a string and what is not a string, where whitespace around the value can be fungible. 
    • requires the infoset outputter to have access to the primtive type at the time it it outputting the string. 
      • ex: <someHexBinary xsi:type="xs:hexBinary">  AF29B3 </someHexBinary> where the whitespace should/does not matter.
      • ex: <someDouble xsi:type="xs:double">    6.847   </someDouble> again the whitespace does not matter.
      • NOTE: should verify that infoset inputters do not trip over such whitespace around non-string simple values. 
      • NOTE: consider DAFFODIL-182 could also be addressed in this same change set - by adding another token to the xmlOutputStyle 'addXSITypes' in which case the infoset outputter would then also add the xsi:type attributes to the simple elements. 

...