Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

At this point I think it should be apparent that the algorithm is a confusing mess. The fact that frag bytes don't represent byte-aligned bytes of data is the assumption we had that ultimately is not being maintained. The design in place here is not up to the job of a buffered data output stream starting not on a byte boundary.

Ideas for possible fixes include

Starting Frag Byte

  • Add a starting frag byte to buffered streams.
    • This byte contains a partial byte to insure that the whole bytes (buffered) are always on byte boundaries, and the frag byte(s) are always on byte boundaries.
    • This doesn't work. Because the starting position of a buffered data output stream isn't necessarily known. It is known in this example because element 'a' has fixed length, but if element 'a' had variable length we would have no idea where the starting position is, or where any byte boundaries actually occur in the data.
  • We could fix the above if we insist that OVC elements have fixed length. DFDL doesn't require this however, so it is not a particularly stable solution.

Split on Bit-order Change

  • Any time bit-order changes, split off yet another buffered data output stream. Record the bit order of every buffered output stream.
  • When eventually collapsing the data output streams together, check that we in fact end up changing bit orders on proper byte boundaries or issue a runtime SDE.
    • Probably need to save information on each buffered data output stream for diagnostic purposes in issuing this error. E.g., the Term's Runtime Data object.

 

The solution to this problem is to split the data output stream again, on every bit order change.

  • Any time bit-order changes, split off yet another buffered data output stream.
  • Record the bit order of every buffered output stream.
  • When eventually collapsing the data output streams together, check that we end up changing bit orders on proper byte boundaries or issue a runtime SDE.
  • A detail is to save information on each buffered data output stream for diagnostic purposes in issuing this error. E.g., the Term's Runtime Data object.

To illustrate this,  So let's go back to where we have just unparsed element 'b' and we're about to unparse element 'c'.

...

draw.io Diagram
bordertrue
viewerToolbartrue
fitWindowfalse
diagramNamemorph60
simpleViewerfalse
width
diagramWidth886
revision2

At this point, all the data output streams have been collapsed, and all subsequent unparsing will be going to this last remaining live data output stream. The collapsing ends the need for the data output streams to first buffer all output. So the overhead introduced by all the above machinery is no longer encountered, unless the need again arises due to another element with dfdl:outputValueCalc, another change of bit order, or the need for alignment fill regions or final unused regions of unknown length.

This algorithm can achieve streaming behavior from unparsing. Note that the delayed unparsing of element 'a' lasted until element 'd' was received and unparsed (into buffered data output stream). At that point the unparser outputs the data for elements 'a', 'b', 'c', and 'd' (most of it for 'd', 7 bits are left in a frag byte). The output of data has caught up to the streaming in of infoset events.

This discussion did not cover several other important aspects of the unparser algorithm:

  • suspended unparsers - for alignment fill, unused regions, and for expressions and dfdl:outputValueCalc.
    • TBD: expressions involving variables yet to be set. 
  • queueing of suspensions on the infoset, and infoset event detection including open/final infoset nodes.
  • capture of start/end positions
  • propagation of start/end positions and start positions, lengths of data output streams

Those will be covered in other pages.