Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The current Parser/Unparser objects are specific to daffodil-runtime1.  Introduction of daffodil-runtime2 requires replacing Parser/Unparser above with RuntimeGenerator. This is an object which is the output of the Daffodil schema compiler and which encapsulates runtime-specific optimizations and behavior.  The above list becomes:

  • DSOM
  • Gram
  • RuntimeRuntimeGenerator
    • For daffodil-runtime1 - Parser/Unparser objects created by parser() and unparser() methods - these are the runtime-specific objects that actually carry out parsing/unparsing. There are also RuntimeData classes which store information used by parsers/unparsers, and Evaluatable objects which encapsulate compiled expressions for evaluation at runtime.
    • For daffodil-runtime2 - Code generated by the generateCode() method. Creates codgen.ast.Generator objects. (TBD: or objects encapsulating those in some manner.)

The DSOM and Gram layers of the Daffodil schema compiler should  be runtime-independent.

...

.

DSOM - Daffodil Schema Object Model

...

  • Only lengthKind 'explicit' or 'implicit' for simple types, and only lengthKind 'implicit' for complex types.

  • Only types long, int, short, byte, unsignedLong, unsignedInt, unsignedShort, unsignedByte, float, double,  string, and hexbinary are supported.

    • Leaves out the decimal, integers greater than 64 bits long, boolean and date/time related types

  • The dfdl:representation is always 'binary'. No text numbers are supported.

  • As the dfdl:binaryNumberRep is always 'binary', integers are fixed-length 2’s complement.

  • When added, note that occursCountKind="expression", and choices with only dfdl:choiceDispatchKey and dfdl:choiceBranchKey implies no backtracking/discrimination is required.

    • Rationale: This and requiring only dfdl:occursCountKind='expression' means there are no ponts points of uncertainty, so there is no backtracking.

...

Properties that end up needed, but shouldn't be - ex: anything about text numbers, anything about date/time - are bugs in Daffodil that should be reported. An include-file DFDL format definition should hide these from users so they are not distracting.

Phases

The above restrictions on the features suggest dividing up the implementation of Runtime 2 into 2 distinct phases:

  • Phase 1: (aka Runtime2P1) No expressions. All lengths are fixed. All arrays have fixed length.
  • Phase 2: (aka Runtime2P2) Adding the DFDL expression language, lengthKind 'explicit', occursCountKind 'expression'.

Goals

  • Use Julian Feiauer Feinauer contributed code generation library so as to have the possibility of Java, C++, and Python backends from Runtime 2
  • Initial focus is a backend where the Infoset is Java POJO objects. The POJO definitions are part of the generated code, which is output as one or multiple text files.
  • DPath expressions (when implemented) compile into native language expressions that navigate Infoset objects the way handwritten code would.

  • Dependencies on Java garbage collection should be minimized and documented.
  • The amount of runtime-library code should be minimum footprint.

    • Selective linking can be assumed (even for Java - search for GraalVM)

  • Satisfy the requirements that caused the PLC4X project to create their own MSpec data format language. (Alternatively, Daffodil with Runtime 2 should be a good target for MSpec compilation.)

    • With one exception: DFDL is still going to be XML Schema based. Changing the syntax of the DFDL language is out of scope, as that's a front-end project. This is a runtime/back-end project.

...