Proposal: DFDLX lookAhead

There are times when parsing data formats when it is nessasary to consider data that occurs at a future point in the bitstream. For instance, consider a simple fixed-length tagged union, where the tag occurs after the union. Conceptually, such a format may be described by:

<xs:choice dfdl:choiceDispatchKey="{ tag }">
  <xs:element name="a" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="1"/>
  <xs:element name="b" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="2"/>
</xs:choice>
<xs:element name="tag" type="xs:int" dfdl:length="8" />

An existing proposal (Proposal: DFDL base plus offset feature - Enables describing TIFF) would allow for this, by making it possible to put the <tag> element first in the infoset, despite it occuring later in the bitstream. However, such a proposal imposes unnessasary complexity for such a usecase. In particular, the schema must specify explicitly to jump forward and backward in the bitstream. Further, the full generallity of the schema involves considering additional issues surrounding unparsing (such as overlapping data).

For basic usecases such as the above, it is possible to instead add support in a much simpler manner, by providing lookahead capabilities directly in DPath:

<xs:choice dfdl:choiceDispatchKey="{ dfdlx:lookAhead(16,8) }">
  <xs:element name="a" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="1"/>
  <xs:element name="b" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="2"/>
</xs:choice>
<xs:element name="tag" type="xs:int" dfdl:length="8" />

Another potential solution would be to allow forward references in DPath expressions during parsing, if the compiler can prove that such a forward reference is resolvable (eg. the portion of content being skipped over is of constant length). However doing so would add significant complexity to both Daffodil and DFDL.

This proposal is to add the dfdlx:lookAhead function to DPath.

dfdlx:lookAhead

dfdlx:lookAhead(distance, bitSize)
- read bitSize bits, where the first bit is located at an offset of distance from the current location
Restrictions
- distance >=0
- bitSize >= 0
- distance + bitSize <= Implementation defined limit no less than 512 bits
- Cannot be called during unparse
- ParseError if looks past EOF
- Undefined behavior if looks past document boundery when in streaming mode.
- bitOrder and byteOrder are determined by the current location. Changes between the current location and the location containing the data being read will not be respected.

Examples

The following two elements are equivalent:

<xs:element name="a" type="xs:unsignedInt" dfdl:length="3" dfdl:lengthUnits="bits" />
<xs:element name="a" type="xs:unsignedInt" dfdl:length="3" dfdl:lengthUnits="bits" dfdl:inputValueCalc="{ dfdlx:lookAhead(0,3) }" />

The following example demonstrates using lookAhead to branch based on a field in the future:

<xs:choice dfdl:choiceDispatchKey="{ dfdlx:lookAhead(16,8) }">
  <xs:element name="a" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="1"/>
  <xs:element name="b" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="2"/>
</xs:choice>
<xs:element name="tag" type="xs:int" dfdl:length="8" />

Page tree

dfdlx:lookAhead

Examples

3 Comments

Mike Beckerle

Brandon Sloane

Mike Beckerle