Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Update obsolete sections that are now implemented.

...

The DFDL Workgroup of the Open Grid Forum also has been saving some issues targeted at DFDL 2.0

Recursion

One of the first things people want to model in DFDL always seems to be a binary legacy document formats like RTF or older MS Word documents. These have recursive structures where a section can contain text and other sections. DFDL v1.0 was not designed with document formats in mind, but rather with more traditional "data sets" or files of data in mind.

...

Layering - Data Source/Target Indirection

(This has an initial implementation now. The API may still evolve.)

The layering feature of Daffodil needs to be extended to enable new external layer transforms to be added via external jars.

...

  1. repeating sequence and choice groups (minOccurs and maxOccurs)
  2. complex type derivations
  3. attributes (already mentioned above)
  4. substitution groups - to enable separate compilation of multi-part DFDL schemas that are very large. (might be overkill - unclear if this is truly needed.)

XML Schema 1.1 / Schematron

(Schematron is now implemented. Rules can be separate or embedded in the schema.)

This new standard supports richer validation rules. They are useful since XML Schema 1.0's validation capabilities are so limited.

Alternatively, embedding schematron rules directly in a DFDL schema is an option.

...

Another formulation would be to specify that an element/group is delimited, but that the terminating markup is not consumed, and hence, must be consumed by whatever comes next in the model.

Character Class Entities

(now

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyDAFFODIL-2720
)

We badly need an entity that means 'any whitespace that is not a line ending'. This avoids the specification of separators like:

...

A possible good name is %LSP*; where the "L" is for "Linear" as in "within a line", meaning specifically matches tab or space characters. (Exactly U+0009 and U+0020, not other Unicode space-like characters)

5 new entities are actually needed: %LSP; %LSP*; %LSP+; %SP*; %SP+;

Summary Functions/Operations

...

Security Features

No Network Mode: (This is less a DFDL language feature than a characteristic desirable for all implementations of DFDL. Applications using DFDL must be able to execute both in an environment which has no access to the internet, and even on machines that do have such access, in a mode where they make no attempt to access anything remotely.implemented now.)

Regular Expression Enhancements

...

Very often one wants dfdl:lengthKind='delimited' or dfdl:lengthKind='explicit' for simple types, but dfdl:lengthKind="implicit" for complex types. Separating the dfdl:lengthKind into two properties, or having the ability to specify either way, would simplify many schemas that otherwise have a error-prone need to have a dfdl:ref='complex' format reference on every element of complex type to override the default dfdl:lengthKind. That or you have to split the schema and put all simple types in one file (and use only those simple types), and all complex types in another.

Table and Range Lookups / Symbolic Enumerations

(see concrete ProposalThis is now implemented.)

Often one has a representation containing enumerations - integer values - which have symbolic meanings. The parsed result from such data wants to contain strings so the logical infoset is readable and understandable. A means is needed to specify a table of integer constants and their corresponding strings, to be used for parsing, and unparsing. Ranges are a generalization where a symbolic string is used to name all the integers that fall in a range.

...