Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The length of the OVC element may not be known; hence, the starting bit position of this added buffering DataOutputStream may not be known until unparsing of the OVC element's suspension has completed.
  • Alignment: Because elements and model groups (terms generally) can have alignment, and text anywhere can have mandatory text alignment, then in the case where we do not know the starting bit position, we are not able to compute the size of the alignment fill region needed.
    • This implies that non-zero alignment requires a split of the data output stream of its own - in the case where the starting bit position is not known.
  • Bit order: Elements can have dfdl:bitOrder, and model groups can have text (e.g., dfdl:initiator), and text implies a bit order as charset encodings each have a specified bit order. It is not meaningful for the bit order to change except on a byte boundary (8 bit boundary). So, if the starting bit position of a buffering data output stream is not known, then the unparser cannot determine whether a bit order change is legal or not until that starting bit position has been determined.
    • This implies that bit order changes require a split of the data output stream of their own - in the case where the starting bit position is not known.
  • Interior Alignment affecting Length: The length of an element of complex type may depend on its starting bit position in data output stream. The element's initial alignment is not part of its length, but this dependency happens because terms (elements or model groups) may have alignment (or mandatory text alignment) on terms they contain (aka "interior" terms). These alignment regions may be of varying size depending on where the term starts in the data output stream; hence, the length of a complex type may not be able to be computed until its starting position is known, and recursively the starting positions of any interior terms inside it are known.
    • This implies that expressions that compute the dfdl:contentLength or dfdl:valueLength of an element must potentially suspend until the starting bit positions become known so that the length of the alignment regions can be computed.
      • Hence, expressions can block, not only on values of infoset elements, but the ending bit position of the representation of infoset elements.
    • Circular deadlocks can occur if an OVC element needs the length of a later element, but the length of the later element depends (by way of this interior alignment issue), on the length of the OVC element.
      • Note: it is expected that formats are rare (but possible) where an OVC element itself is a variable-length element. Most commonly OVC elements have fixed lengths (in our experience), as they are most common in binary data formats where the length fields are also fixed-length binary integers. Formats have been described; however, where a length is expressed in a textual integer, which varies in size depending on the magnitude of the value, followed by a terminating delimiter. So variable-length OVC elements are possible. Just uncommon.
  • Target length: Some elements have an explicit length which can be fixed, or given by a dfdl:length expression. When unparsing, this dfdl:length expression is evaluated to give a value known as the target length. This can differ from the value's implicit length in that the value may need to be padded to achieve the target length, or for xs:string only, the value may need to be truncated to fit within the target length.
    • TBD: For elements with explicit length, there is an element unused region at the end which may need to be filled (with dfdl:fillByte). For simple elements this would also be a difference between value and content length. For complex types. .......
    • There is commonly a circular dependency between an OVC element storing a length, and the element whose length it stores. Deadlock is avoided when unparsing because the value of the OVC element must depend only on the dfdl:valueLength (which excludes padding/filling), and so can be computed without reference to the target length of the element. The target length expression is then able to depend on the value of the OVC element and the circularity is avoided.
  • Expression Evaluation Modes: When unparsing, expressions can be evaluated in backward-only mode (just like parsing), or in forward-referencing mode where they can block waiting for updates to the infoset. (Adding children, closing/finishing the infoset element - indicating no more children to be added, setting a value, setting nilled, determining length, etc.)
    • Expressions can reference variables, whose values are assigned by way of dfdl:setVariable or dfdl:newVariableInstance expressions. These also can (TBD: must?) be evaluated in forward-referencing mode.
      • (TBD: Must? ... because we don't know if they'll be referenced from backward-only expressions or forward-referencing expressions of an OVC element, or recursively another variable value expression where the variable was referenced from an OVC element expression.)
  • Queuing Suspensions
    • Note: a quick and dirty implementation which actually defeats streaming behavior, is to just queue all suspensions centrally until the primary unparser pass is over. Then just loop through the suspensions retrying them until they all succeed.
    • Suspensions should be stored on the infoset elements they are blocked on. Infoset modifications (as values are added, or lengths become known, or children elements are added) should generate events, and those events should trigger retries of the suspensions.
  • Pruning the Infoset: True streaming behavior requires that the parts of the infoset that are no longer needed by expressions, and that have already been unparsed, are dropped so that their memory can be recovered.
    • Some formats by their nature defeat streaming. For example, a format which has a header which contains the length of the entire rest of the data, such header cannot be unparsed and emitted to the output stream until the length of the entire infoset can be computed; hence, at minimum a buffer containing the entire unparsed representation has to exist temporarily to enable computing this length.
    • Other formats are stream-capable easily - formats that use delimited length kind only, for example,
    • Formats with OVC elements are stream-capable within limits. Streaming is blocked for the span of the infoset and its representation, going from an OVC element to the infoset elements it forward references (and their representations). This much data must be buffered, but once those forward references can be resolved, the streaming can resume.

...