Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. It has to generate the same execution plan as H2 for commonly used queries (co-located queries) - only two phases, this means there is no intermediate local task having a Sender on top of execution sub-graph and a Receiver at the bottom for such query (except cases when such behavior is forced by hints - it's helpful to delegate results aggregation to server nodes in case a requesting client have a little free memory)

  2. It has to provide an ability to execute any non-recursive non-collocated queries in reasonable period of time.
  3. It has to provide memory management abilities to defend the application from OOM (memory quotes, using disk for intermediate results, etc)
  4. It has to provide SQL enhancement abilities (system functions, user defined functions, hints, etc) to execute 
  5. It should generate optimal execution plan for for non-recursive non-collocated queries taking into consideration two factors: a) transferring data amount, b) each local subtask execution complexity.
  6. It should provide enhancement points for future improvements (new transformation rules, different source data structure types support - indexes and tables initially and prefix trees or spatial indexes in future, possible column based storage support in future, etc)

The list may be increased.

Expected integration steps:

  1. Ignite logical convention implementing (Relational graph nodes, converter rules), so, Calcite can use Ignite's own operations costs, we have a control on what variant of graph is preferable.
  2. Index Scan rules implementing - Apache Phoenix experience may be reused. Range filters, sorted scans, some projections transform into index scans.
  3. Exchange related rules implementing (affinity aware) - Apache Drill experience may be reused. SINGLETON, RANDOM, HASH and BROADCAST distribution types needed.
  4. Sender/Receiver infrastructure implementing. - Each Exchange rewrites into a pair of Receiver and Sender where Receiver is a relation node and Sender is an infrastructure object which is used to stream target Exchange subgraph result to a particular remote receiver.
  5. Physical convention implementing - as a start point we may use one of provided by Calcite conventions (Bindable, Enumerable, Interpretable) rewriting particular relational nodes and converter/transform rules into our own implementations one by one.

...