ID

IEP-37

AuthorAuthors

Sponsor

Created

06 Sep 2019

Status

...

No data co-location control, i.e. arbitrary data can be returned silently
Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes.
Limited execution modes: either two-phase execution (default) or "distributed joins" which adds one more phase
Lack of proper planner which will take in count both data distribution and data statistics
H2 optimizer is very primitive. It can do only predicates push down, join order choosing and also some minor optimizations. It lacks of many useful optimizations like this one
Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key IGNITE-6085
.

Description

The approach to solve the limitations implies more complex execution flow that brings a new abstraction: execution graph.

...

There are several example of successful Calcite integrations (Apache Drill, Apache Flink, Hive, etc)

Calcite based SQL engine requirements.

It has to generate the same execution plan as H2 for commonly used queries (co-located queries)
It has to execute
It should generate optimal execution plan for non-collocated

Ignite logical convention implementing (Relational graph nodes, converter rules), so, Calcite can use Ignite's own operations costs, we have a control on what variant of graph is preferable.
Index Scan rules implementing - Apache Phoenix experience may be reused. Range filters, sorted scans, some projections transform into index scans.
Exchange related rules implementing (affinity aware) - Apache Drill experience may be reused. SINGLETON, RANDOM, HASH and BROADCAST distribution types needed.
Sender/Receiver infrastructure implementing. - Each Exchange rewrites into a pair of Receiver and Sender where Receiver is a relation node and Sender is an infrastructure object which is used to stream target Exchange subgraph result to a particular remote receiver.
Physical convention implementing - as a start point we may use one of provided by Calcite conventions (Bindable, Enumerable, Interpretable) rewriting particular relational nodes and converter/transform rules into our own implementations one by one.

...