...
It has to generate the same execution plan as H2 for commonly used queries (co-located queries) - only two phases, this means there is no intermediate local task having a Sender on top of execution sub-graph and a Receiver at the bottom for such query (except cases when such behavior is forced by hints - it's helpful to delegate results aggregation to server nodes in case a requesting client have a little free memory)
- It has to provide an ability to execute any non-recursive non-collocated queries in reasonable period of time.
- It has to provide memory management abilities to defend the application from OOM (memory quotes, using disk for intermediate results, etc)
- It has to provide SQL enhancement abilities (system functions, user defined functions, hints, etc) to execute
- It should generate optimal execution plan for for non-recursive non-collocated queries taking into consideration two factors: a) transferring data amount, b) each local subtask execution complexity.
- It should provide enhancement points for future improvements (new transformation rules, different source data structure types support - indexes and tables initially and prefix trees or spatial indexes in future, possible column based storage support in future, etc)
The list may be increased.
Expected integration steps:
- Ignite logical convention implementing (Relational graph nodes, converter rules), so, Calcite can use Ignite's own operations costs, we have a control on what variant of graph is preferable.
- Index Scan rules implementing - Apache Phoenix experience may be reused. Range filters, sorted scans, some projections transform into index scans.
- Exchange related rules implementing (affinity aware) - Apache Drill experience may be reused. SINGLETON, RANDOM, HASH and BROADCAST distribution types needed.
- Sender/Receiver infrastructure implementing. - Each Exchange rewrites into a pair of Receiver and Sender where Receiver is a relation node and Sender is an infrastructure object which is used to stream target Exchange subgraph result to a particular remote receiver.
- Physical convention implementing - as a start point we may use one of provided by Calcite conventions (Bindable, Enumerable, Interpretable) rewriting particular relational nodes and converter/transform rules into our own implementations one by one.
...