Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info

This proposal was implemented as part of Daffodil 2.5.0

This page is linked from https://s.apache.org/daffodil-blob-feature. If this page content moves, please update that link from https://s.apache.org.

...

Additionally, logic must be created to remove BLOB files if Daffodil backtracks past an already created BLOB. This can be handled by storing the list of BLOB files in the PState, and upon deleting the appropriate files in the list before resetting back to an early state.

...

  1. Get the URI from the infoset and the file length. If the length cannot be determined, throw an UnparseError.
  2. As with hexBinary, determine the length of the hexBinary content and error if the BLOB file length is larger than the content length
  3. Open the File using a FileInputStream. If opening of the file fallsfails, throw an UnparseError
  4. Read bytes from the FileInputStream and write them to the UState dataOutputStream. Chunk the reads into smaller byte lengths to minimize total memory required and to support >2GB of data. If at any point there is an IOException, throw an UnparseError.
  5. As with hexBinary, write skip bits if the content length is not filled completely.

...

This proposal does allow for access to the length  of a BLOB element. This is almost certainly needed since it is very common in data formats to include both a BLOB payload and the length of that payload. On unparse, we certainly almost certainly need the ability to calculate the length of the BLOB data so that the value can be output in a length field in the data. Fortunately, the content/valueLength functions do not actually query the data, but instead query bitPositions in stored in the infoset. Thus, no changes should be necessary to support this. 

...

  1. Use the new API to specify a temp directory for BLOBs to be stored
      
  2. Perform type aware comparisons for the xs:anyURI type, similar to what we do now for xs:date, xs:dateTime, and xs:time. Type awareness will be enable by using the xsi:type  attribute on the expected infoset, since Daffodil does not currently supprt adding xsi:type information to the actual infoset as of yet. And example looks something like:

    Code Block
    languagexml
    <tdml:dfdlInfoset>
      <data xsi:type="dfdlxxs:blobanyURI">path/to/blob/data</data>
    </tdml:dfdlInfoset>

    During type aware comparisons, the TDML Runner will extract and modify the path (e.g. find the file and convert it to absolute in the infoset) to be suitable for use in logic similar to finding files using the type="file" attribute for expected infosets. Once the expected file is found, it will compare the contents of that file with the contents of the URI specified in the actual infoset and report any differences as usual.
      

  3. After a test completes, delete all BLOB files listed in the InfosetOutputter

...