Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info

This design has been superseded by a new design.

Principles of Operation

Many global group defs exist with the intention that they are ONLY going to be used as hidden groups. An example of this is presence-bit indicator flags. These are 1-bit elements that live in a hidden group because they indicate the presence or absence of an element in the data. These flags can be used via dfdl:occursCount and dfdl:occursCountKind='expression', or via flags, choices, and discriminators. Either way they are a common case of hidden groups.

Other global group defs are created with the intention that they are only going to be used as non-hidden groups. However, users are free to use these as hidden groups.

Polymorphic Terms which are sometimes hidden, sometimes not, in the same schema, are expected to be far less common. Many schemas are expected to have no such groups. Every Term will be known to be hidden, or known to be not-hidden.

Choices, and the choice branch maps used for unparsing, are related to the isHidden problem due to the transition plan below.

Implementation: Schema Compiler

The daffodil schema compiler has Root.refMap that allows us to know what all group refs are referring to a particular group def. This allows us to know if all such are hidden group refs, all are not hidden group refs, or they are a mixture. That tells the schema compiler if a given global group def is always hidden, always not hidden, or a mixture.

Furthermore, by looking at the transitive closure of the refMap, one can determine for every Term, whether is always appears within a hidden group, never appears within a hidden group, or some of both.

This calculation can be done in a single walk of the DSOM tree structure. Its complexity is order of the number of Term objects in the DSOM tree.

The algorithm is roughly:

Code Block
// on class Term (for every term in the schema)
lazy val optIsKnownHidden : Option[Boolean] = 
     case if the term is a model group, and the parent of the model group is a global group def.
       then using the ref map for all group refs referring to this group def
         if all are hidden group ref 
           then check for elements that if isSimpleType, that it is defaultable or has outputValueCalc, and SDE otherwise.
                result is Some(true)
         if none are hidden group ref then result is Some(false)
         else None
     case if the term is any other kind of term, then if it has a lexically enclosing model group
       then the result is the optIsKnownHidden of the lexically enclosing model group.
     case if the term is the root element then Some(false)

This Term.optIsKnownHidden is also carried on TermRuntimeData structures for all Terms.

If Term.optIsKnownHidden is Some(true), then the schema compiler should check for elements that if they are of simple type they are either defaultable or have dfdl:outputValueCalc. It is an SDE otherwise.

No check is required if Term.optIsKnownHidden is Some(false), and a check will occur at runtime for Term.optIsKnownHidden = None.

TBD: it may be useful to have a global attribute on the root computed which indicates if there are any of these mixed-hidden Terms. If all Terms are known either hidden or non-hidden with no ambiguity, further optimizations may apply. For other runtime backends, it may even be disallowed to have these mixed-hidden Terms. For Runtime1, however, the runtime overhead of this implementation is expected to be so little that this may be unnecessary, and profiling studies should indicate whether further performance attention is needed

Implementation: Runtime var DIElement.isHidden

This flag member is set on infoset elements at the time they are created (parsing) or spliced into the infoset (unparsing - streaming unparser).

We  dynamically maintain boolean PState/UState member isInsideHiddenContext at runtime:

ParseOrUnparseState has member.

Code Block
var isInsideHiddenContext : Boolean = false

In the parse1 and unparse1 methods of Parser and Unparser respectively, we implement (example shows parser)

Code Block
trd match {
case srd: SequenceGroupRef if srd.optIsKnownHidden.isDefined && 
                              srd.optIsKnownHidden.get == true) => 
    if (state.isInsideHiddenContext) {
     parse(state)
    } else {
     state.isInsideHiddenContext = true
     parse(state)
     state.isInsideHiddenContext = false
    }
}
case _ => parse(state)

Implementation

This flag member is set on infoset elements at the time they are created (parsing) or spliced into the infoset (unparsing - streaming unparser).

This works by dynamically maintaining boolean state member isInsideHiddenContext.

ParseOrUnparseState has member.

Code Block
var isInsideHiddenContext : Boolean = false

...

SequenceRuntimeData has constructor argument/member:

Code Block
val maybeHiddenGroupRefArg : Maybe[ModelGroupRuntimeData] 
lazy val maybeHiddenGroupRef = maybeHiddenGroupRefArg // and also serialized

with value Nope for anything except a SequenceGroupRef that has the dfdl:hiddenGroupRef property. This is a Maybe type to provide for access to the referenced group's model-group runtime data. However, for purposes of isHidden computation this is just used as a boolean flag.

...

When infoset elements are created by element combinators, (parsing), or when they are accepted and spliced into the infoset by element combinators (unparsing) they

...

call DIElement.setIsHidden:

Code Block
elem.setIsHidden{
   if (erd.isKnownHidden.isDefined)
      erd.isKnownHidden.get
      // no checking needed. It should have been done at schema compile time.
   else {
      val res = state.isInsideHiddenContext
      if (res && erd.isSimpleType) 
        if (!erd.isDefaultable && !erd.isOutputValueCalc)
          state.SDE(...must be defaultable or OVC...) // checking in runtime case. 
      res
   }
 }

Runtime Checking

...

When an element is setIsHidden(true), then if it is of simple type it should be checked to insure it is either defaultable or dfdl:outputValueCalc, and it is a runtime SDE if not as shown above.

Implementation: Choice Combinators

ChoiceCombinator.unparser method computes a choiceBranchEventMap, and also determines statically which branch should be taken if the choice is hidden

...

.

if trd.optIsKnownHidden is None, then both these are parameters to the ChoiceUnparserCombinator which decides at runtime if it is unparsing an isHidden choice, and selects to use the statically determined branch if so, otherwise if not hidden uses the choiceBranchEventMap. If trd.isKnownHidden is Some(true), then the ChoiceBranchEventMap can be omitted, and if trd.isKnownHidden is Some(false), then the statically determined branch can be omitted.

The ChoiceBranchEventMap is computed without regard for isHidden, that is, it assumes the current choice is not, itself hidden. Contents of each branch of the choice may contain hidden content or not.

Test Plan/Design-for-Test

Tests should insure that elements are properly hidden if they are multiple group references away from a true dfdl:hiddenGroupRef.

Test schemas with groups that appear both hidden and non-hidden are required to insure the runtime determination is exercised.

Transition Plan

Releases 2.5.0 and prior did not use this technique.

...

Since we cannot know if an element will be hidden or not until runtime, many checks currently done at schema compile time must instead also be done at runtime. For example, in childrenInHiddenGroupNotDefaultableOrOVC, a check is done for whether an element inside a hidden group is neither defaultable nor has dfdl:outputValueCalc. This test must be done at runtime instead as described above, in addition to doing it at compile time when optIsKnownHidden is Some(true).

Static isHidden parameters - all these must be removed. Anything that statically depends on knowledge of isHidden must be revised to not depend on full knowledge of it.Static analysis changes: many currently static attributes such as possibleFirstChildElementsInInfoset, currently depend on isHidden being statically determined. There is a specific static childrenInHiddenGroupNotDefaultableOrOVC attribute which must be removed

Transitioning to new IsHidden technique requires also transitioning the ChoiceBranchMap technique simultaneously.

  • Static isHidden attribute is used by possibleFirstChildElementInInfoset.
  • possibleFirstChildElementInInfoset is used by possibleNextChildElementInInfoset
  • possibleNextChildElementInInfoset is used by nextParentElements which recursively uses possibleNextChildElementInInfoset
  • possibleNextChildElementInInfoset is used by identifyingEventsForChoiceBranch.

Those are the only uses of the static isHidden attribute. While we still must compute identifyingEventsForChoiceBranch, we must do so without reference to static isHidden information. The members possibleFirstChildElementInInfoset, possibleLastChildElementInInfoset, and nextParentElements computations may be eliminated or adapted.

The algorithm for identifyingEventsForChoiceBranch must be adapted to not require upward navigation (back-pointers).

All such properties should be computed on global group definitions, not repeatedly for every group reference.

Algorithm TBD

  • possibleFirstChildElementInInfoset calls possibleFirstChildTerms
  • possibleFirstChildTerms calls possibleNextSiblingTerms
  • possibleNextSiblingTerms calls enclosingTerms (note terms plural)

This seems problematic, as it is determining possible next sibling terms for ALL contexts. This may be problematic, and even allow erroneous/invalid infoset event streams to be unparsed.

We may need a runtime-structure akin to or perhaps the same as the TRD Stack technique used in InfosetInputter.