Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


IDIEP-37
AuthorAuthors
Sponsor
Created 06 Sep 2019
Status
Status
colourGrey
titleDRAFT

...

  • No data co-location control, i.e. arbitrary data can be returned silently
  • Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes.
  • Limited execution modes: either two-phase execution (default) or "distributed joins" which adds one more phase
  • Lack of proper planner which will take in count both data distribution and data statistics
  • H2 optimizer is very primitive. It can do only predicates push down, join order choosing and also some minor optimizations. It lacks of many useful optimizations like this one 
    Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyIGNITE-6085

Description

The approach to solve the limitations implies more complex execution flow that brings a new abstraction: execution graph.

...

There are several example of successful Calcite integrations (Apache Drill, Apache Flink, Hive, etc)

Calcite based SQL engine requirements.

  1. It has to generate the same execution plan as H2 for commonly used queries (co-located queries)

  2. It has to execute 
  3. It should generate optimal execution plan for non-collocated

The integration steps:

  1. Ignite logical convention implementing (Relational graph nodes, converter rules), so, Calcite can use Ignite's own operations costs, we have a control on what variant of graph is preferable.
  2. Index Scan rules implementing - Apache Phoenix experience may be reused. Range filters, sorted scans, some projections transform into index scans.
  3. Exchange related rules implementing (affinity aware) - Apache Drill experience may be reused. SINGLETON, RANDOM, HASH and BROADCAST distribution types needed.
  4. Sender/Receiver infrastructure implementing. - Each Exchange rewrites into a pair of Receiver and Sender where Receiver is a relation node and Sender is an infrastructure object which is used to stream target Exchange subgraph result to a particular remote receiver.
  5. Physical convention implementing - as a start point we may use one of provided by Calcite conventions (Bindable, Enumerable, Interpretable) rewriting particular relational nodes and converter/transform rules into our own implementations one by one.

...