Apache Drill provides query capabilities against a variety of data systems.
By enabling Drill for DFDL-described data, one could immediately query data that has a DFDL schema describing its format.
Metadata Mapping
TBD: does Drill support...
- nullable complex types (a column containing a sub-table, that is itself nullable?)
- date/time/datetime types
- big int, big decimal
- nullable strings (distinguished from empty strings)
- namespaces (of some sort)
TBD: should we be trying to simplify the metadata to make querying easier, or be ruthlessly uniform so that queries will be ugly but at least consistent?
TBD: should we be trying to handle XSD here (all of it) or just DFDL?
TBD: as with namespace-distinctions, where we warn when an element is only distinguishable by the namespace, which isn't represented in, for example JSON, we could also warn about Anonymous choices or other things that make metadata mapping to Drill (or NiFi or ... ) harder.
type (of element unless noted) | nillable (yes/no, * = don't care) | dimension (scalar, optional, array, * = don't care) | drill metadata | |
---|---|---|---|---|
* | * | array | sub-table with added index column to hold position (note: name of index column should not collide) | |
date/time | * | * | TBD: are there date/time types corresponding? If so use them, if not use strings in ISO8601 format | |
string | Must map any DFDL infoset illegal string characters to Drill-allowed characters (analogous to what we do with XML-illegal characters for converting the DFDL infoset to XML). | |||
string | * | scalar | String (non nullable) TBD: is empty string distinguished from null string in Drill? (ANSI SQL databases distinguish empty strings from null strings - DFDL also distinguishes these. Some other databases do not) | |
simple type | no | scalar | corresponding Drill type | |
simple type | yes | scalar | nullable corresponding drill type (TBD: no distinction from string. Combine with string if there is no distinction) | |
simple type | no | optional | nullable corresponding drill type (TBD: no distinction from string. Combine with string if there is no distinction) | |
simple type | yes | optional | nullable corresponding drill type (note: the two concepts of optional and nullable are collapsed) (TBD: no distinction from string. Combine with string if there is no distinction) | |
simple type | no | array | sub table with index and non-nullable value column (TBD: no distinction from string. Combine with string if there is no distinction) | |
simple type | yes | array | sub table with index and nullable value column (TBD: no distinction from string. Combine with string if there is no distinction) | |
bounded size unsigned integers (excluding unsignedLong) | * | * | next larger size signed integer | |
unsignedLong | TBD: Do we have bignum? TBD: should we just restrict this to range of signed long type? TBD: just use string? | |||
integer (unbounded) | TBD: Do we have a corresponding type? (if not use string) | |||
decimal | TBD: Do we have a corresponding type? (If not use string) | |||
complex sequence | no | scalar | TBD: merge children into parent context? TBD: extend child element names with enclosing element name? TBD: name collisions? TBD: more than one child with same name? (non-array case) | |
complex sequence | yes | scalar | sub table | |
complex sequence | * | optional or array | sub table | |
complex choice | ||||