Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Layering - Data Source/Target Indirection

(See concrete Proposal) (Note this is being implemented in Daffodil - a version was committed to master branch in April/May 2018)

Often one needs multiple passes. The value of some element, which might be a string, a hexBinary, or an array of bytes, wants to be used as the input for more parsing.

...

The ability to have data of simple type become XML attribute values would go a long ways to making DFDL-created-XML more human-friendly, and more efficient - attributes don't repeat the attribute name twice. E.g., <bigLongName>0</bigLongName> becomes bigLongName="0".

More XML Schema Constructs

...

  1. repeating sequence and choice groups (minOccurs and maxOccurs)
  2. complex type derivations
  3. attributes (already mentioned above)
  4. substitution groups - to enable separate compilation of multi-part DFDL schemas that are very large. (might be overkill - unclear if this is truly needed.)

XML Schema 1.1

This new standard supports richer validation rules. They are useful since XML Schema 1.0's validation capabilities are so limited. Alternatively, embedding schematron rules directly in a DFDL schema is an option.

Delimited by Next Item

The ability to say that an element or group is delimited, but that it is delimited by the boundary of finding the initiator of the next element or group would simplify the description of many formats.

...

These functions need to be able to examine the Daffodil processor state (Infoset and data streams).

Note that checksums would need to work in conjunction with layering, as a layer would supply the 'raw' data for the checksum, but also allow the structure of that data to be expressed so that it can be parsed.

Security Features

No Network Mode: This is less a DFDL language feature than a characteristic desirable for all implementations of DFDL. Applications using DFDL must be able to execute both in an environment which has no access to the internet, and even on machines that do have such access, in a mode where they make no attempt to access anything remotely.

...

DFDL schemas involve some large and complex regular expressions. Even the most advanced regular expression languages lack convenient ability to define a given construct once and name it, and then reuse it by somehow referencing that name. This would dramatically ease construction of regular expressions, and it is simply basic software engineering that large and complex things need to be named and reused, not duplicated. 

A coherent proposal here would be very useful outside of DFDL/Daffodil.

Graph of Nodes and Edges Data

...

Some data formats contain header information including a table of offsets in the file to later parts of the data. The ability to directly express offsets within the data (absolute, or relative to some anchor, such as the end of the table of offsets) would make describing these kinds of data files much more direct.

A good example of such a format is TIFF.

Expression Language - Let - Return Construct

...

For DFDL 2.0 we should fix these annoyances. We should allow complex type derivations, combining properties on them in the exact same manner as we do for simple types. We should allow properties to be annotated on a complex type definition and for those to be combined with those on an element referencing that type.

Allowing Properties on Group Definitions and Group References

And... exactly the same thing for groups and group references. We don't allow these to be annotated currently. We should.

Separate LengthKind for Simple and Complex Types

...

Table and Range Lookups

(see concrete Proposal)

Often one has a representation containing enumerations - integer values - which have symbolic meanings. The parsed result from such data wants to contain strings so the logical infoset is readable and understandable. A means is needed to specify a table of integer constants and their corresponding strings, to be used for parsing, and unparsing. Ranges are a generalization where a symbolic string is used to name all the integers that fall in a range.