You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

Apache Drill provides query capabilities against a variety of data systems.

By enabling Drill for DFDL-described data, one could immediately query data that has a DFDL schema describing its format.

Metadata Mapping

TBD: does Drill support...

  • nullable complex types (a column containing a sub-table, that is itself nullable?)
  • date/time/datetime types
  • big int, big decimal
  • nullable strings (distinguished from empty strings)
  • namespaces (of some sort)

TBD: should we be trying to simplify the metadata to make querying easier, or be ruthlessly uniform so that queries will be ugly but at least consistent?

TBD: should we be trying to handle XSD here (all of it) or just DFDL?

TBD: as with namespace-distinctions, where we warn when an element is only distinguishable by the namespace, which isn't represented in, for example JSON, we could also warn about Anonymous choices or other things that make metadata mapping to Drill (or NiFi or ... ) harder. 

type (of element unless noted)nillable (yes/no, * = don't care)dimension (scalar, optional, array, * = don't care)drill metadata
**arraysub-table with added index column to hold position (note: name of index column should not collide)
date/time**TBD: are there date/time types corresponding? If so use them, if not use strings in ISO8601 format
string

Must map any DFDL infoset illegal string characters to Drill-allowed characters (analogous to what we do with XML-illegal characters for converting the DFDL infoset to XML).
string*scalar

String (non nullable) TBD: is empty string distinguished from null string in Drill? 

(ANSI SQL databases distinguish empty strings from null strings - DFDL also distinguishes these. Some other databases do not)


simple typenoscalarcorresponding Drill type
simple typeyesscalar

nullable corresponding drill type

(TBD: no distinction from string. Combine with string if there is no distinction)


simple typenooptional

nullable corresponding drill type

(TBD: no distinction from string. Combine with string if there is no distinction)


simple typeyesoptional

nullable corresponding drill type (note: the two concepts of optional and nullable are collapsed)

(TBD: no distinction from string. Combine with string if there is no distinction)


simple typenoarray

sub table with index and non-nullable value column

(TBD: no distinction from string. Combine with string if there is no distinction)


simple typeyesarray

sub table with index and nullable value column

(TBD: no distinction from string. Combine with string if there is no distinction)







bounded size unsigned integers 

(excluding unsignedLong)

**next larger size signed integer
unsignedLong

TBD: Do we have bignum? 

TBD: should we just restrict this to range of signed long type?

TBD: just use string?


integer (unbounded)

TBD: Do we have a corresponding type? (if not use string)
decimal

TBD: Do we have a corresponding type? (If not use string)





complex sequencenoscalar

TBD: merge children into parent context?

TBD: extend child element names with enclosing element name?

TBD: name collisions? 

TBD: more than one child with same name? (non-array case)


complex sequence yesscalarsub table 
complex sequence*optional or arraysub table 
complex choice








  • No labels