Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

IDIEP-37
Authors
Sponsor
Created 06 Sep 2019
Status
Status
colourGrey
titleDRAFT


Table of Contents

Motivation

Ignite SQL engine is used across hundreds of deployments who are accelerating relational databases or use Ignite as a system of records. However, these days we see that many more companies are migrating to distributed databases that speak SQL natively. With a larger adoption comes a higher demand for SQL support that is comparable to RDBMS. For instance, if a couple of years ago 1 out of 10 use cases needed support for multi-joins queries or queries with subselects or efficient memory usage then today there are 5 out of 10 use cases of this kind; in the foreseeable future, it will be a 10 out of 10. So, the evolution and a major adoption of the distributed databases is in progress -- the relational world goes distributed. In result, it's getting time-consuming for both Ignite SQL maintainers (and experts who help to tune it for production usage) to carry on by having a dependency on H2.

...

The key point in the aforementioned plan is to have a relational algebra execution plan which can undergo arbitrary equivalence transformations by the rules of relational algebra. There is a well-studied optimization approach used in many production systems [3] for optimizing query execution plans. 

Description

The idea of this IEP is to introduce all missing intermediate steps into the query execution flow and operate on query execution graph.

...

Optimized query execution graph (or query plan) used to build final execution tasks.

Example:


Initial query:

SELECT t1.name, t2.name as projectName FROM Persons t1, Projects t2 where t1.id == t2.responsiblePersonId


Let's assume there is no collocation and the data placed on different nodes.

Initial execution graph:
Project (t1.name name, t2.name projectName)
Join (t1.id == t2.responsiblePersonId)
Scan (Persons t1)
Scan (Projects t2)
Transformed graph:
Exchange (SINGLE) // collecting
Project (t1.name name, t2.name projectName)
Join (t1.id == t2.id)
Exchange (HASH t1.id) // repartitioning
Project (t1.name name, t1.id id)
Scan (Persons t1)
Exchange (HASH t2.id) // repartitioning
Project (t2.name name, t2.responsiblePersonId id)
Scan (Projects t2)
Split tasks:

1) Executes on a client node:

...

There are several example of successful Calcite integrations (Apache Drill, Apache Flink, Hive, etc)

Calcite based SQL engine requirements.

  1. It has to generate the same execution plan as H2 for commonly used queries (co-located queries) - only two phases, this means there is no intermediate local task having a Sender on top of execution sub-graph and a Receiver at the bottom for such query (except cases when such behavior is forced by hints - it's helpful to delegate results aggregation to server nodes in case a requesting client have a little free memory)

  2. It has to provide an ability to execute any non-recursive non-collocated queries in reasonable period of time.
  3. It has to provide memory management abilities to defend the application from OOM (memory quotes, using disk for intermediate results, etc)
  4. It has to provide SQL enhancement abilities (system functions, user defined functions, hints, etc)
  5. It has to generate optimal execution plan for non-recursive non-collocated queries taking into consideration two factors: a) transferring data amount, b) each local subtask execution complexity.
  6. It has to provide enhancement points for future improvements (new transformation rules, different source data structure types support - indexes and tables initially and prefix trees or spatial indexes in future, possible column based storage support in future, etc)

The list may be increased.

Implementation details.

Apache Calcite uses graph based representation of relational operators, each node has node specific meta information (join type, projection columns, sort operation direction) and general outgoing data properties (traits in terms of Calcite). Each node may have limited count of trait types (equal to count of registered Trait definitions for a planner). Trait types are defined for whole graph before its construction. Calcite framework transforms nodes (node specific properties), data traits and even nodes position based on initial nodes meta information, data traits and relative position.

...

Gliffy Diagram
nameOptimized physical plan
pagePin1

Expected integration steps:

  1. Ignite logical convention implementing (Relational graph nodes, converter rules), so, Calcite can use Ignite's own operations costs, we have a control on what variant of graph is preferable.
  2. Index Scan rules implementing - Apache Phoenix experience may be reused. Range filters, sorted scans, some projections transform into index scans.
  3. Exchange related rules implementing (affinity aware) - Apache Drill experience may be reused. SINGLETON, RANDOM, HASH and BROADCAST distribution types needed.
  4. Sender/Receiver infrastructure implementing. - Each Exchange rewrites into a pair of Receiver and Sender where Receiver is a relation node and Sender is an infrastructure object which is used to stream target Exchange subgraph result to a particular remote receiver.
  5. Physical convention implementing - as a start point we may use one of provided by Calcite conventions (Bindable, Enumerable, Interpretable) rewriting particular relational nodes and converter/transform rules into our own implementations one by one.




Risks and Assumptions

The main issue is the new Calcite based engine (the engine) is completely different to current one. At first the engine will available via internal API. We need really good test coverage to make sure the feature works as expected in all possible scenarios.

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-td43724.html

Reference Links

[1] https://arxiv.org/pdf/1802.10233.pdf

...

https://drill.apache.org/

Tickets

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyIGNITE-12248

...