Apache Drill provides query capabilities against a variety of data systems.
By enabling Drill for DFDL-described data, one could immediately query data that has a DFDL schema describing its format.
Metadata Mapping
TBD: does Drill support...
- nullable complex types (a column containing a sub-table, that is itself nullable?)
- date/time/datetime types
- big int, big decimal
- nullable strings (distinguished from empty strings)
TBD: should we be trying to simplify the metadata to make querying easier, or be ruthlessly uniform so that queries will be ugly but at least consistent?
TBD: should we be trying to handle XSD here (all of it) or just DFDL?
type | nillable (yes/no, * = don't care) | dimension (scalar, optional, array, * = don't care) | drill metadata | |
---|---|---|---|---|
* | * | array | sub-table with added index column to hold position (note: name of index column should not collide) | |
date/time | * | * | TBD: are there date/time types corresponding? If so use them, if not use strings in ISO8601 format | |
string | Must map any DFDL infoset illegal string characters to Drill-allowed characters (analogous to what we do with XML-illegal characters for converting the DFDL infoset to XML). | |||
string | * | scalar | String (non nullable) TBD: is empty string distinguished from null string in Drill? (ANSI SQL databases distinguish empty strings from null strings - DFDL also distinguishes these. Some other databases do not) | |
simple type | no | scalar | corresponding Drill type | |
simple type | yes | scalar | nullable corresponding drill type (TBD: no distinction from string. Combine with string if there is no distinction) | |
simple type | no | optional | nullable corresponding drill type (TBD: no distinction from string. Combine with string if there is no distinction) | |
simple type | yes | optional | nullable corresponding drill type (note: the two concepts of optional and nullable are collapsed) (TBD: no distinction from string. Combine with string if there is no distinction) | |
simple type | no | array | sub table with index and non-nullable value column (TBD: no distinction from string. Combine with string if there is no distinction) | |
simple type | yes | array | sub table with index and nullable value column (TBD: no distinction from string. Combine with string if there is no distinction) | |
bounded size unsigned integers (excluding unsignedLong) | * | * | next larger size signed integer | |
unsignedLong | TBD: Do we have bignum? TBD: should we just restrict this to range of signed long type? TBD: just use string? | |||
integer (unbounded) | TBD: Do we have a corresponding type? (if not use string) | |||
decimal | TBD: Do we have a corresponding type? (If not use string) | |||
complex sequence | no | scalar | TBD: merge children into parent context? TBD: extend child element names with enclosing element name? TBD: name collisions? TBD: more than one child with same name? (non-array case) | |
complex sequence | yes | scalar | sub table | |
complex sequence | * | optional or array | sub table | |
complex choice | ||||