...
ID | IEP-37 | ||||||||
Authors | |||||||||
Sponsor | |||||||||
Created | 06 Sep 2019 | ||||||||
Status |
|
Table of Contents |
---|
Ignite SQL engine is used across hundreds of deployments who are accelerating relational databases or use Ignite as a system of records. However, these days we see that many more companies are migrating to distributed databases that speak SQL natively. With a larger adoption comes a higher demand for SQL support that is comparable to RDBMS. For instance, if a couple of years ago 1 out of 10 use cases needed support for multi-joins queries or queries with subselects or efficient memory usage then today there are 5 out of 10 use cases of this kind; in the foreseeable future, it will be a 10 out of 10. So, the evolution and a major adoption of the distributed databases is in progress -- the relational world goes distributed. In result, it's getting time-consuming for both Ignite SQL maintainers (and experts who help to tune it for production usage) to carry on by having a dependency on H2.
Ignite community is willing to work on a prototype of alternate SQL engines and selected Apache Calcite as the first candidate.
Info |
---|
Intro to Apache Calcite: |
Below you can see a list of the limitations of the current SQL engine outlined technicallyCurrent SQL engine has a number of critical limitations:
SELECT t1.a, t2.b FROM t1, t2 WHERE t1.id = t2.id
- silently returns wrong results in case of t1 and t2 are not co-locatedSELECT * FROM person p WHERE p.salary > (SELECT avg(salary) from person)
- doesn’t work at all because it cannot be executed in two stepsJira | ||||||||
---|---|---|---|---|---|---|---|---|
|
Jira | ||||||||
---|---|---|---|---|---|---|---|---|
|
...
The key point in the aforementioned plan is to have a relational algebra execution plan which can undergo arbitrary equivalence transformations by the rules of relational algebra. There is a well-studied optimization approach used in many production systems [3] for optimizing query execution plans.
The idea of this IEP is to introduce all missing intermediate steps into the query execution flow and operate on query execution graph.
...
Optimized query execution graph (or query plan) used to build final execution tasks.
SELECT t1.name, t2.name as projectName FROM Persons t1, Projects t2 where t1.id == t2.responsiblePersonId
Let's assume there is no collocation and the data placed on different nodes.
Project (t1.name name, t2.name projectName)
Join (t1.id == t2.responsiblePersonId)
Scan (Persons t1)
Scan (Projects t2)
Exchange (SINGLE) // collecting
Project (t1.name name, t2.name projectName)
Join (t1.id == t2.id)
Exchange (HASH t1.id) // repartitioning
Project (t1.name name, t1.id id)
Scan (Persons t1)
Exchange (HASH t2.id) // repartitioning
Project (t2.name name, t2.responsiblePersonId id)
Scan (Projects t2)
1) Executes on a client node:
...
There are several example of successful Calcite integrations (Apache Drill, Apache Flink, Hive, etc)
...
It has to generate Generate the same execution plan as H2 for commonly used queries (co-located queries) - only two phases, this means there is no intermediate local task having a Sender on top of execution sub-graph and a Receiver at the bottom for such query (except cases when such behavior is forced by hints - it's helpful to delegate results aggregation to server nodes in case a requesting client have a little free memory).
The list may be increased.
...
Apache Calcite uses graph based representation of relational operators, each node has node specific meta information (join type, projection columns, sort operation direction) and general outgoing data properties (traits in terms of Calcite). Each node may have limited count of trait types (equal to count of registered Trait definitions for a planner). Trait types are defined for whole graph before its construction. Calcite framework transforms nodes (node specific properties), data traits and even nodes position based on initial nodes meta information, data traits and relative position.
...
Gliffy Diagram | ||||||
---|---|---|---|---|---|---|
|
...
The main issue is the new Calcite based engine (the engine) is completely different to current one. At first the engine will available via internal API. We need really good test coverage to make sure the feature works as expected in all possible scenarios.
http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-td43724.html
[1] https://arxiv.org/pdf/1802.10233.pdf
...
Apache Calcite-powered SQL Engine Roadmap
Jira | ||||||
---|---|---|---|---|---|---|
|
Tickets caused by H2 limitations:
Jira | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|