Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Revised to match implementation

There

...

is an implementation of this in Daffodil as of 2019-06-19 as of git hash 015891ff982144ab07f092a25ab133707a9a31e9) See the SchemaComponent.scala file.

Short Schema Component Instance Component  Designators (SSCIDSSCD)

This is loosely based on the concepts of the W3C Schema Component Designator (SCD) spec:

http://www.w3.org/TR/xmlschema-ref/

However, this must be adapted to our needs, as it is a bit too verbose to use in diagnostic messages, and doesn't have schema component instance paths, doesn't have a notion of schema document, etc.

...

  • Component: a schema component is one of the things a schema author writes in a schema.
  • Component Instance: a schema component instance is the non-shared instance of a sharable schema component, that is, in its usage context. 
    • For example: a global type definition must be referenced from an element to be used. The type as it appears in the context of that element is called an 'instance' of that schema component.
  • Occurrence: in data, and the infoset, the data corresponding to a schema element declaration is called an 'occurrence' of the element. (Not to be confused with 'instance')

...

SSCDs are conceptually related to SCDs, but there are quite a number of differences. 

  • relative schema component designators
  • minimal set of axes which DFDL needs.
    • Note however, that we may need attributes. E.g., to refer to the maxOccurs attribute of a specific element declaration we would write e1/@maxOccurse=e1@maxOccurs
  • only the very abbreviated syntactic forms
  • our own abbreviated versions of sequence, choice, and group reference path steps.
  • quasi-elements for access to DFDL annotations
  • convention for referring to a specific schema document (via URI)

We will use an abbreviation for 'model::sequence' and 'model::choice' as those are too verbose for our purposes. 'S' and 'C' will do, however, since these could be ambiguous with elements named 'S' and 'C' we use the official w3c verbose notation if there is any ambiguity.
Similarly we will abbreviate a group reference to group g1 as 'G::g1'
So long as what we create is easily mapped onto an official W3C SCD, then what we use can be more abbreviated. Our APIs will want to return either official W3C SCDs, or our abbreviated variant.  
Schema Component Instances in Context
Our paths are schema component instance designators. These are longer paths that can reach across a reference within the schema. E.g, we need to refer to an element that has a named type, but we need to refer to things inside that instance of the type for the element that has that type. Similarly we need to reach across group references, and element references. 
This is done by simply continuing the path.  E.g., Suppose element e1 has named type t1, which is a complex type with a group reference to a named group g1, containing a sequence, which contains 2 child sequences, each containing an element e2 and e3 respectively.
A Daffodil Short Schema Component Instance Designator (SSCID) corresponding to this inner e2 would be:
e1/~t1/G::g1/S/S[2]/e2
This has no direct correspondence in a w3c SCD, because XML Schema is context free; hence, there is no need to have paths that give the enclosing context. But in Daffodil, it matters greatly; hence our SSCIDs allow creation of these longer paths. 
Implementation
The SchemaComponent class has some abstract methods:
def sSCIDStep: SSCIDStep
The type SSCIDStep stands for 'short SCD step'.
Final methods on SchemaComponent will assemble the complete relative Short Schema Component Instance Designator (SSCID) from the components. These are relative from the root/document element:
def sSCID: SSCID

SSCD Syntax

An SSCD consists of a number of path steps separated by ":".

When an SSCD path step contains a reference to a DFDL schema named declaration/definition, the QName of that construct is used. If the schema has no namespace, then this QName will not have a prefix part. It will be a local-name only. When the schema has a target namespace, then this QName will use the usual prefix:name syntax, where the prefix is one of the prefix definitions for the namespace.

An NCName is the local-only part of a name. I.e., without a namespace prefix.

A path step is constructed as per this table. The single letter "N" denotes a number which is the position of the construct within the enclosing element, but only if that position is greater than 1. If the position is 1 (this is 1-based indexing), then no number is used.  This provides uniqueness, but does not provide XPath-style indexing information based on the kind of construct. That is, if an element reference is followed by a sequence then the element reference will get N and the sequence N+1 even though they are not the same kind of construct.

ConstructSSCD Path Step

Element Reference

<xs:element ref="QName" ....>

erN=QName

Local Element Decl

<xs:element name="name" ...>

eN=QName if the element form is "qualified"

eN=NCName if the element form is "unqualified"

Global Element Decl

<xs:element name="name"...>

e=QName

Global ComplexType Def

<xs:complexType name="name" ....>

ct=QName

Local Complex Type Def

<xs: element ...><xs:complexType>...

ct
Element's Type Reference to a Global Complex Type

ct=QName

Global Simple Type Def

<xs:simpleType name="name"...>

st=QName

Local Simple Type Def

<xs:element ...><xs:simpleType base="QName">....

st=QName

Element's Type Reference to a Global Simple Type

or to a primitive type.

st=QName
Choice GroupcN
Sequence GroupsN

Global Choice Group Def

<xs:group name="QName"><xs:choice ...>...

cgd=QName

Global Sequence Group Def

<xs:group name="QName"><xs:sequence ...>...

sgd=QName
Group Reference to a Global Choice Group DefcgrN=QName
Group Reference to a Global Sequence Group DefsgrN=QName

Implementation Notes

It is not always possible to form an SSCD for a schema component in a non-well formed DFDL schema. For example, suppose a global element decl is missing its name attribute. There is no way to refer to that problematic part of the schema using an SSCD because the QName must be part of the SSCD. For this case, use an XPath treating the DFDL schema file as an XML document.

Since ":" is also used in QName syntax to separate namespace prefixes from local names, one cannot split an SSCD trivially on the ":" into path steps.

Note that a schema component cannot create its SSCID SSCD step without knowing what its index is within its lexically enclosing parent. E.g., the 2nd sequence child of another sequence needs to create a step with a [2] at the endhigher N value based on its position.

Since these will be used in diagnostic messages, the code to create these must be minimalist in nature. Nothing can go wrong in it. It cannot throw any sort of exception, nor depend on say, OOLAG LVs. The methods which create these will catch Throwable and Assert.abort() if anything is thrown.

SSCID SSCD for DFDL Annotations

Not sure this is needed, but if we want to specify an SSCID SSCD for a specific DFDL annotation, then we use quasi-elements dfdl:formatdfdl:sequencedfdl:simpleType, etc. That is, there is no representation of the annotation or appinfo constructs needed for long-format annotations.The one problem

SSCD for DFDL Schema Files

is that XML Schema and w3c SCD provide no means to refer to schema documents; hence, one cannot refer to individual top-level annotations. This is the same bug we see in XSOM and other schema object models.  
We solve this by allowing a URI for the schema document, followed by a URI fragment which contains the SSCID SSCD for the dfdl:format annotation.
Data Element Occurrence IDs
An occurrence of an element is identified by it's path in the infoset, the SSCID for its component instance, and a unique integer called the trip-count. The trip count increments each time the SSCID is used so that backtracking to the same path and SSCID creates unique occurrence IDs. 
The trip count is represented by "(n)" where n is an integer.
When that is too verbose, parts can be omitted. For example, the path can be omitted if clear from context.