Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

There are a number of algorithms that taken together implement DFDL's unparsing. The subject of this page is the I/O (output really) layer's buffering system.

DFDL has a feature known by the dfdl:outputValueCalc (OVC for short) property. This property holds an expression in DFDL's expression language, and this expression typically refers to elements that are later in the infoset than the element whose declaration carries the OVC.

The Daffodil unparser is designed to support streaming behavior - the infoset arrives as a series of infoset events generated by a Daffodil InfosetInputter. Daffodil attempts to stream output data to the output stream without waiting for the entire infoset to arrive. Ideally, once an infoset element's start and end events have arrived the element's representation could be written to the output stream.

However, the OVC feature complicates this. OVC is a tremendously powerful feature in DFDL which enables a DFDL schema to truly capture inter-dependencies between elements, typically when one element stores the length of another element, but the two elements are not adjacent in the infoset. For example, a length field may appear in a header part of the data, and much later the part of the data whose length is given in that header, is represented.

OVC elements typically don't appear in the stream of infoset events. In order to support data parsing and unparsing, they are tolerated, but ignored, if they appear in the infoset events, and the values are recomputed. For purposes of this discussion we'll assume that no events appear in the infoset events corresponding to OVC elements. 

The unparser determines that an OVC element must be computed and added to the infoset, typically when an infoset event is encountered that is known to occur after the OVC element. When this is detected, an infoset element is added to the infoset corresponding to the OVC element, but this element has no value, and so it cannot be unparsed.

The OVC element's value is computed from a DFDL expression. This expression is evaluated in a special unparser-specific mode where forward-reference into the infoset is expected. If the expression can be evaluated and that evaluation returns a value, then the value is placed in the infoset element for the OVC, and unparsing then proceeds as if the element had arrived as infoset events with a value.

It is when the OVC element's expression cannot be evaluated that things get interesting. Typically an OVC element has an expression that refers to something later than it in the infoset; hence, this first attempt to evaluate the OVC element expression will fail - the evaluation mode throws specific exception types indicating why the expression could not be evaluated. These exceptions are caught, informing the unparser that the expression could not be successfully evaluated.The unparser then creates a Suspension object. A suspension contains a copy of the Unparser's state, and the expression. The suspension is queued for later evaluation when the unparser determines it should be retried. This could be a late as when the entire infoset has been created (by then every suspension must be able to be evaluated), or the suspension could be queued on the first infoset element where the expression was unable to be evaluated. In this case, once that infoset element is updated, the suspension could be immediately retried.

For purposes of this discussion we'll not worry about exactly when the suspensions are re-evaluated.

A key concept in this suspending of expressions for OVC elements, is that unparsing must continue. It is not sufficient to just consume incoming infoset events until the infoset contains all the elements needed by the expression, because the expression can actually contain things like this:

Code Block
dfdl:outputValueCalc="{ dfdl:valueLength(../later/elem, 'bits') }"

This function, dfdl:valueLength, measures the length of the representation of the '../later/elem' element. Hence, that element of the infoset not only has to exist, but we must unparse it into a buffer, measure the length of it's value's representation.

Ultimately, this means after an OVC element is suspended, we must continue unparsing into buffers, and keep track, on each infoset element, the start and end positions of the content and value regions of the representation. (The content length is greater than or equal to the value length. In the representation, there are 2 start positions and 2 end positions for each element: content start, value start, value end, and content end. Content length is greater than value length when padding is inserted.)

One simple optimization Daffodil uses, is to only keep track of the content and value start/end positions for elements that actually appear in expressions.

(Idea for future: for debugging, it may be useful to compute these for every element, so as to be able to show a user exactly where the representation of every element is, and this applies to both parsing and unparsing.)

  a bug in bit-order handling when unparsing. This bug came up in Link16 when surrounded by NACT envelopes because NACT is bigEndian MSBF, and Link16 is littleEndian LSBF.

This example recreates what the problem is.

...