You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

(Page is work in progress.)

See also: https://rmw42.wordpress.com/2018/05/23/growing-dfdls/ – Russ Williams ideas that are related to BLOBs, or solving related problems.

DFDL needs an extension that allows data much larger than memory to be manipulated.

A variety of data formats such as for image and video files, consist of fields of what is effectively metadata, surrounding large blocks of data containing compressed image or video data.

An important use case for DFDL is to expose this metadata for easy use, and to provide access to the large data via a streaming mechanism akin to opening a file.

In RDBMS systems, BLOB (Binary Large Object) and CLOB (Character large object) are the types used when the data row returned from an SQL query will not contain the actual value data, but rather a handle that can be used to open/read/write/close the BLOB or CLOB.

DFDL needs at least BLOB capability. This would enable processing of images or video of arbitrary size without the need to every hold all the data in memory.

This also eliminates the limitation on object size.

BLOB Feature for DFDL

Data objects larger than a single JVM object can store (e.g., video or images) may have to be represented in the Infoset by a proxy object. Standard streaming-style events normally produce simple values as regular objects representing the value. If a simple value is larger than a single JVM object can store, then a streaming API to access the value is needed.

BLOBs in the DFDL Infoset

The DFDL Infoset doesn't really specify what the [value] member is for a hexBinary object - that is it does not specify what the API is for accessing this value. Currently it is Array[Byte], but we can provide other abstractions. Also, the [value] member for type xs:string is assumed to be a java.lang.String, but we can provide other abstractions. Hence, the problem of BLOB objects is different depending on how the infoset is being accessed.

These handle objects would support the ability to open and access the contents of these large objects as java.nio.Channel or java.io.InputStream (for hexBinary), and java.io.Reader (for String). For unparsing channels or symmetric use of java.io.OutputStream or java.io.Writer are the basic mechanisms.

BLOBs in XML

When projecting the DFDL infoset into XML, these handle objects would have to show up as the XML serialization of the handle object, with usable members so that other software can access the data the handle is referring to. One example would be that the handle contains a fileName or URI and an offset (type Long) into it, and a length (type Long), and possibly the first N bytes/characters of the data.

Lifecycle of a BLOB

  • When can a BLOB be accessed?
  • When is access to data of a BLOB lost?

Implementation Concerns

  • Want to avoid copying the BLOB data to a file when possible.
  • No labels