A Subtle Bit-Order Problem for Unparsing

Consider unparsing data described by this schema:

<dfdl:format lengthKind="explicit"
   dfdl:byteOrder="littleEndian" 
   dfdl:bitOrder="mostSignificantBitFirst" 
   dfdl:lengthUnits="bits"
   dfdl:alignment="1" dfdl:alignmentUnits="bits"/>

<element name="r" dfdl:lengthKind="implicit">
  <complexType>
   <element name="x" type="xs:int" dfdl:length="8"
   <element name="a" type="xs:int" dfdl:outputValueCalc="{ dfdl:valueLength(../d) }"
                     dfdl:length="5"/>
   <element name="b" type="xs:int" dfdl:length="3"/>
   <element name="c" type="xs:int" dfdl:length="4" dfdl:bitOrder="leastSignificantBitFirst"/>
   <element name="d" type="xs:string" dfdl:lengthKind="pattern" 
        ... string in a LSBF encoding like the 7-bit ascii packed one ..../>
   ...
 </complexType>
</element>

The important feature above is that at element 'c', the bit order changes from MSBF to LSBF. And this happens immediately after element 'b' which has dfdl:outputValueCalc, that must wait until we have the infoset element 'd' to compute its value.

This should be fine, because 'a' is 5 bits wide, and 'b' is 3 bits wide, so we're on a byte boundary logically when this change occurs.

Now let's consider this infoset

<r>
  <x>255</x> <!-- hex FF binary 1111 1111-->
  <a>22</a> <!-- binary 10110 -->
  <b>7</b> <!-- binary 111 -->
  <c>6</c> <!-- binary 0101 -->
  <d>abc</d>
  ...
</r>

Now we are going to try to animate what happens as we unparse this based on the arrival of the infoset events to the unparser.

Our start state has a DataOutputStream which is 'direct' meaning attached to an actual Java JVM output stream. Because DFDL and Daffodil are very bit-oriented we implement bit-level I/O on top by accompanying this output-stream with a "frag byte", a single byte which contains from 0 to 8 bits which are the fragment of a byte that is not yet complete and ready to be written to the actual output stream.

We'll depict a data output stream like so. This shows a data output stream where 4 whole bytes + 2 additional bits have been written.

On the left we have the JVM stream (or a buffer) holding whole bytes which we'll write in hex. On the right we have the frag byte which will eventually, once filled up, flow into the whole bytes part, at which point the frag byte will be reset. At the bottom of the frag byte we have the current bit order (shown as MSBF), and the number of bits in the fragment (shown as 2). The data in the frag byte illustrates the bits that are significant, with X for the bits as yet unoccupied by unparsed data.

So let's look at unparsing the infoset we see in the XML above.

We start from an empty data output stream. We will get a Start-Element 'r' event. No I/O occurs as this root element has nothing about it in the representation. So per below, the data output stream remains empty as here:

We then get a Start-Element 'x' event and as it is simple type, the value 255.

This is a whole byte, and the stream is currently byte aligned (because it is empty), so this data is output to the whole-bytes part of the data output stream:

Next to unparse is element 'a'. However, this has dfdl:outputValueCalc, which depends on the infoset event for element 'd', which we don't have yet. So now the unparser works its magic, and splits off a buffered data input stream. So below, the stream on the left is the direct stream, the buffered stream is on the right.

Now we suspend the computation of element 'a' for later, but proceed to unparse element 'b', into the buffered stream. This results in:

We see that the 3 bits representing the value 6, 110, are in the frag byte of the buffered stream.

Page tree

A Subtle Bit-Order Problem for Unparsing