Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Large atomic objects of type xs:string and xs:hexBinary cannot be turned into ordinary Java String and Array[Byte]. Rather, they must be some sort of small handle or proxy object. A tunable threshold should be available to tell Daffodil when to create a handle versus an ordinary String or Array[Byte].

Data objects larger than a single JVM object can store (e.g., video or images) may have to be represented in the Infoset by a proxy object. Standard streaming-style events normally produce simple values as regular objects representing the value. If a simple value is larger than a single JVM object can store, then a streaming API to access the value is needed.

The DFDL Infoset doesn't really specify what the [value] member is for a hexBinary object - that is it does not specify what the API is for accessing this value. Currently it is Array[Byte], but we can provide other abstractions. Also, the [value] member for type xs:string is assumed to be a java.lang.String, but we can provide other abstractions.

These handle objects would support the ability to open and access the contents of these large objects as java.nio.Channel or java.io.InputStream (for hexBinary), and java.io.Reader (for String).

When projecting the DFDL infoset into XML, these handle objects would have to show up as the XML serialization of the handle object, with usable members so that other software can access the data the handle is referring to. One example would be that the handle contains a fileName or URI and an offset (type Long) into it, and a length (type Long), and possibly the first N bytes/characters of the data.

The traditional name for these handle objects is BLOB (Binary Large Object) or CLOB (Character Large Object). The notion is that one gets these handles, and to actually access the data they represent you must "open" them, process their contents using a file/stream type of mechanism, and close them.

See the BLOB objects proposalThis mechanism needs to work both for parsing and unparsing; hence, an API way of constructing these large-data handle objects is needed.

Infoset Events

Infoset elements must be produced incrementally by the parser. These can only be produced once surrounding points of uncertainty are resolved fully. An architecture for this is needed. There may be some limitations.

...