A Subtle Bit-Order Problem for Unparsing

There is a bug in bit-order handling when unparsing. This bug came up in Link16 when surrounded by NACT envelopes because NACT is bigEndian MSBF, and Link16 is littleEndian LSBF.

This example recreates what the problem is.

Consider unparsing data described by this schema:

<dfdl:format lengthKind="explicit"
   dfdl:byteOrder="littleEndian" 
   dfdl:bitOrder="mostSignificantBitFirst" 
   dfdl:lengthUnits="bits"
   dfdl:alignment="1" dfdl:alignmentUnits="bits"/>

<element name="r" dfdl:lengthKind="implicit">
  <complexType>
   <element name="x" type="xs:int" dfdl:length="8"
   <element name="a" type="xs:int" dfdl:outputValueCalc="{ dfdl:valueLength(../d) }"
                     dfdl:length="5"/>
   <element name="b" type="xs:int" dfdl:length="3"/>
   <element name="c" type="xs:int" dfdl:length="4" dfdl:bitOrder="leastSignificantBitFirst"/>
   <element name="d" type="xs:string" dfdl:lengthKind="pattern" 
        ... string in a LSBF encoding like the 7-bit ascii packed one ..../>
   ...
 </complexType>
</element>

The important feature above is that at element 'c', the bit order changes from MSBF to LSBF. And this happens immediately after element 'a' which has dfdl:outputValueCalc, that must wait until we have the infoset element 'd' to compute its value.

This should be fine, because 'x' is 8 bits wide, 'a' is 5 bits wide, and 'b' is 3 bits wide, so we're on a byte boundary logically when this change occurs.

Now let's consider this infoset

<r>
  <x>255</x> <!-- hex FF binary 1111 1111-->
  <a>22</a> <!-- binary 10110 -->
  <b>7</b> <!-- binary 111 -->
  <c>6</c> <!-- binary 0101 -->
  <d>abc</d>
  ...
</r>

Now we are going to try to animate what happens as we unparse this based on the arrival of the infoset events to the unparser.

Our start state has a DataOutputStream which is 'direct' meaning attached to an actual Java JVM output stream. Because DFDL and Daffodil are very bit-oriented we implement bit-level I/O on top by accompanying this output-stream with a "frag byte", a single byte which contains from 0 to 7 bits which are the fragment of a byte that is not yet complete and ready to be written to the actual output stream. The frag byte never contains all 8 bits (except transiently) because when it has all 8 bits we write it to the byte stream and clear it.

We'll depict a data output stream like so. This shows a data output stream where 4 whole bytes + 2 additional bits have been written.

On the left we have the JVM stream (or a buffer) holding whole bytes which we'll write in hex. On the right we have the frag byte which will eventually, once filled up, flow into the whole bytes part, at which point the frag byte will be reset. At the bottom of the frag byte we have the current bit order (shown as MSBF), and the number of bits in the fragment (shown as 2). The data in the frag byte illustrates the bits that are significant, with X for the bits as yet unoccupied by unparsed data.

So let's look at unparsing the infoset we see in the XML above.

We start from an empty data output stream. We will get a Start-Element 'r' event. No I/O occurs as this root element has nothing about it in the representation. So per below, the data output stream remains empty as here. Note that the bit position (1 based) is 1, but nothing has been output at that position. That is the position where output will begin.

We then get a Start-Element 'x' event with the value 255.

This is a whole byte, and the stream is currently byte aligned (because it is empty), so this data is output to the whole-bytes part of the data output stream:

This leaves us at bit position 8.

Next to unparse is element 'a'. However, this has dfdl:outputValueCalc, which depends on the infoset event for element 'd', which we don't have yet. So now the unparser works its magic, and splits off a buffered data input stream. So below, the stream on the left is the direct stream, the buffered stream is on the right.

Because we know the length (in bits) of element 'a' to be 5 bits, we know the starting position of the buffered stream. Logically, the first bit in the buffered stream is at bit 14.

Now we suspend the computation of element 'a' for later, but proceed to unparse element 'b', into the buffered stream. This results in:

We see that the 3 bits representing the value 6, 111, are in the frag byte of the buffered stream. The buffered stream now contains 3 bits. That means it ends at bit 16, with position at bit 17.

Now we get the start event for 'c'. However, element 'c' uses the LSBF bit order.

This should be fine, because the logical position in the bit stream is 17, which is 1 mod 8, i.e., byte aligned.

However, this is the subtle bug. It's not at a byte boundary in the buffered stream because that didn't start on a byte boundary. Rather, every byte in the buffered stream is off by 3 bits. We're at bit 4 in MSBF order in the frag byte.

Let's look at the data in bits, using A for the bits not yet populated from element 'a':

1111 1111 | AAAA A111

That's the whole byte for element 'x', followed by the 5 bits for 'a' (unknown as yet), and the 3 bits for 'b'. Let's add to this sequence, the next byte, and using letter 'C' show which bits should be occupied by the value of element 'c'

1111 1111 | AAAA A111 | XXXX CCCC

Because element 'c' is LSBF, and we are starting a new byte, the bits for 'c' go in the least-significant bits of the third byte. The value is 6, which is 0101. So the bits should be:

1111 1111 | AAAA A111 | XXXX 0101

So, now considering our buffered data output stream above, we want to write out the bits of element 'c', but we can't yet flush the frag byte because we don't have all its bits yet. The byte boundary in the data is after the three 111 bits in the frag byte. So we have 5 bits of the next byte available to write in the frag byte, but we only have the data as yet to populate 1 of them.

So we actually need a second frag byte, or we have no place to put all the bits of 'c' in the data output stream.

But ultimately our algorithm and frag byte system, have broken down here. As the bits in frag byte 1 aren't contiguous any more. It contains 4 live bits and 4 unused, but the used bits are not adjacent. The second frag byte is supposed to be LSBF, but its shown here MSBF so we can see how the bits are assembled so they come out adjacent when combined with the bits in frag byte 1.

In addition, the next 4 bits to be output, should go into the unused bits of frag byte 1, not frag byte 2. Only once those 4 bits are output, at which point we can flush frag byte 1 to the whole bytes buffer, do we start writing to frag byte 2, which can become frag byte 1, and we then drop frag byte 2.

At this point I think it should be apparent that the algorithm is a confusing mess. The fact that frag bytes don't represent byte-aligned bytes of data is the assumption we had that ultimately is not being maintained. The design in place here is not up to the job of a buffered data output stream starting not on a byte boundary.

Ideas for possible fixes include

Starting Frag Byte

Add a starting frag byte to buffered streams.
- This byte contains a partial byte to insure that the whole bytes (buffered) are always on byte boundaries, and the frag byte(s) are always on byte boundaries.
- This doesn't work. Because the starting position of a buffered data output stream isn't necessarily known. It is known in this example because element 'a' has fixed length, but if element 'a' had variable length we would have no idea where the starting position is, or where any byte boundaries actually occur in the data.
We could fix the above if we insist that OVC elements have fixed length. DFDL doesn't require this however, so it is not a particularly stable solution.

Split on Bit-order Change

Any time bit-order changes, split off yet another buffered data output stream. Record the bit order of every buffered output stream.
When eventually collapsing the data output streams together, check that we in fact end up changing bit orders on proper byte boundaries or issue a runtime SDE.
- Probably need to save information on each buffered data output stream for diagnostic purposes in issuing this error. E.g., the Term's Runtime Data object.

So let's go back to where we have just unparsed element 'b' and we're about to unparse element 'c'.

If we recognize that element 'c' has a different bit order, and we split off another buffered stream we get:

Now, the middle stream is marked FINAL, because we are splitting due to bit order change, not due to some additional suspended material to be added on later. The right-most stream contains the representation of 'c', confident that this stream begins on a byte boundary.

Now we unparse element 'd'. This is a string, and without getting into the details, we're going to say that it contains 28 bits containing the characters 'abc' then a NUL. These characters are 7-bits wide each, and they are stored LSBF. These characters in hex are 'a' is 0x61, 'b' is 0x62, 'c' is 0x63, and NUL is 0x00. First we unparse the 'a' getting 7 bits of 1100001. The frag byte has room for 4 characters, and LSBF those are the lease significant 4 bits of this character, so 0001. So the data output stream gets 0001 added to its frag byte for a full byte of 00001110 or hex 0x0E. This would be moved to the whole bytes, and the remaining 2 bits of the character 11 placed in the frag byte. So our streams now look like this:

By the same technique we output the next 3 characters, 'b', 'c', and the NUL (7 bits each), and we get:

At this point we have unparsed element 'd', so we know it's length now, it is 28 bits long.

NEXT UP: 'a' outputValueCalc can be evaluated, and the whole thing can collapse to the right (illustrate this).

Page tree

A Subtle Bit-Order Problem for Unparsing