Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Table of Contents

Motivation

Ignite SQL engine is used across hundreds of deployments who are accelerating relational databases or use Ignite as a system of records. However, these days we see that many more companies are migrating to distributed databases that speak SQL natively. With a larger adoption comes a higher demand for SQL support that is comparable to RDBMS. For instance, if a couple of years ago 1 out of 10 use cases needed support for multi-joins queries or queries with subselects or efficient memory usage then today there are 5 out of 10 use cases of this kind; in the foreseeable future, it will be a 10 out of 10. So, the evolution and a major adoption of the distributed databases is in progress -- the relational world goes distributed. In result, it's getting time-consuming for both Ignite SQL maintainers (and experts who help to tune it for production usage) to carry on by having a dependency on H2.

Ignite community is willing to work on a prototype of alternate SQL engines and selected Apache Calcite as the first candidate.

Below you can see a list of the limitations of the current SQL engine outlined technicallyCurrent SQL engine has a number of critical limitations:

  • Query execution is hard-coded to a primitive map-reduce flow (SQL query is split into a 'map query' and 'reduce query'). This flow is fundamentally incomplete as it is impossible to execute the following queries:
    • SELECT t1.a, t2.b FROM t1, t2 WHERE t1.id = t2.id - silently returns wrong results in case of t1 and t2 are not co-located
    • SELECT * FROM person p WHERE p.salary > (SELECT avg(salary) from person) - doesn’t work at all because it cannot be executed in two steps
    • Jira
      serverASF JIRA
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyIGNITE-11448
  • Query execution is done using H2 query engine which itself has several issues
    • Low control on how query executes internally, as a result, we have limited possibility to implement improvements/fixes (H2 is outside of ASF)
    • H2 is a local database and has no notion of distributed query execution
    • H2 lacks the proper planner which will take in count both data distribution and data statistics
    • H2 optimizer is very primitive. It can do only predicates push down, join order choosing and also some minor optimizations. It lacks many useful optimizations like the following 
      Jira
      serverASF JIRA
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyIGNITE-6085

...