You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

IDIEP-24
AuthorVladimir Ozerov Ozerov
SponsorVladimir Ozerov Ozerov
Created19 Jun 2018
StatusDRAFT


Motivation

SQL query may scan arbitrary set of cache values. In general case interested values may reside on every cluster node, so broadcast is needed. What is worse, when distributed joins are enabled, another broadcast for every broadcasted message may be needed. One of the most important optimization techniques is so-called "partition pruning", allowing to calculate a set of partitions and nodes which will return empty result without actual query execution. Query parts are not executed on these nodes, increasing overall system throughput and providing nearly linear scaling. 

Recent researches in distributed query processing assume that CPU is relatively cheap resource, while network IO and disk IO are equally costly operations. Partition pruning not only saves network requests, but also avoids unnecessary execution of local query parts, preventing page cache trashing and additional disk reads (mostly random). 

Thus, partition pruning is a critical optimization technique for both distributed and local queries and should be treated as high priority task for Ignite SQL engine.

Partition pruning is already implemented in Apache Ignite in very simplified form [1]. Only WHERE condition with equality is considered and only for SQL queries without joins. We should expand it further.

[1] https://issues.apache.org/jira/browse/IGNITE-4509

Theory

This technique is used excessively by both distributed databases and classical RDBMS vendors (as a part of their partitioning/sharding solutions). First, query is analyzed and optionally re-written to allow for better partition pruning. Second, information of interesting partitions are extracted from query for every table (cache) and subquery (for complex multi-stage distributed queries). 

Query Rewrite

WHERE conditions and JOIN-s are relatively simple targets for partition extraction. The main difficulty is subqueries, as they potentially introduce unlimited nesting that complicates analysis a lot. So the main goal of query rewrite is to minimize number of subqueries. Main techniques are JOIN conversion, correlation and materialization. Interestingly, all these techniques not only simplify partition extraction, but also allow optimizer to choose better execution plan for local execution, as JOINs are much easier to be analyzed and reordered than subqueries. As we shall see below, efforts are usually focused on non-correlated subqueries as they are the most costly for straightforward execution.

JOIN conversion

The goal is to convert a subquery to join if possible. Important prerequisite is that number of resulting rows is not changed. Outer(FK) → inner(PK) joins are possible. E.g. subqueries in SELECT (aka "derived table") and FROM clauses.

Example 1: JOIN conversion for derived table

Before
SELECT emp.name, (SELECT dept.name FROM dept WHERE emp.dept_id=dept.id)
FROM emp
WHERE emp.salary>1000
After
SELECT emp.name, dept.name
FROM emp, dept
WHERE emp.salary>1000 AND emp.dept_id=dept.id

Example 2: JOIN conversion for FROM clause

Before
SELECT emp.name, dept_subquery.name
FROM emp, (SELECT * FROM dept WHERE state='CA') dept_subquery
WHERE emp.salary>1000 AND emp.dept_id=dept_subquery.id
After
SELECT emp.name, dept.name
FROM emp, dept
WHERE emp.salary>1000 AND emp.dept_id=dept.id AND dept.state='CA'

Also, it is possible for some IN-clauses. MariaDB calls it "table pullout optimization" [1]

Example 3: Table pullout optimization

Before
SELECT emp.name
FROM emp
WHERE emp.salary>1000 AND emp.dept_id IN (SELECT id FROM dept WHERE state='CA')
After
SELECT emp.name
FROM emp, dept
WHERE emp.salary>1000 AND emp.dept_id=dept.id AND dept.state='CA'

Another example where inner subquery return at most one row is outer(PK) → inner(FK GROUP BY PK) case. Subquery is not eliminated in this case, but IN expression is replaced with more comfortable JOIN.

Example 4: Replacing IN with JOIN

Before
SELECT dept.name
FROM dept
WHERE dept.id IN (SELECT emp.dept_id FROM emp GROUP BY emp.dept_id HAVING COUNT(*)>10)
After
SELECT dept.name
FROM dept JOIN (SELECT emp.dept_id FROM emp GROUP BY emp.dept_id HAVING COUNT(*)>10) emp_subquery ON dept.id=emp_subquery.dept_id

Correlation

Uncorrelated subquery may need to scan huge result set. If underlying cache is PARTITIONED, then multiple nodes need to be scanned. If underlying cache is REPLICATED, then scan is local. If previous optimizations are not applicable, we can try to push-down condition from outer table to subquery. If condition is equality and both outer and inner columns are co-located, then potentially distributed query is reduced to co-located query. The downside is that subquery must be evaluated multiple times, once for every tuple from outer relation. This might be less than optimal comparing to subquery materialization explained below.  MySQL refer to this as IN-to-EXISTS optimization [2].

Example 5: Converting non-correlated subquery to correlated via condition push-down

Before
SELECT emp.name
FROM emp
WHERE emp.dept_id IN (SELECT id FROM dept WHERE state='CA')
After
SELECT emp.name
FROM emp
WHERE EXISTS (SELECT id FROM dept WHERE dept.id=emp.dept_id AND state='CA')

Materialization

If subquery is not correlated, then it can be executed only once. Result is cached in temporary table and then re-used. If expected number of returned records is small, then it makes sense to execute the request before other query parts, and send obtained result to other map nodes. If result is scalar, then entire query is replaced with a single result.

This way some queries which were too complex to extract target partitions from may be reduced to simpler form, where extraction is possible.

[1] https://mariadb.com/kb/en/library/table-pullout-optimization/
[2] https://dev.mysql.com/doc/refman/8.0/en/subquery-optimization-with-exists.html

Partition Extraction

TODO

Proposed Changes

TODO

Tickets

key summary type created updated due assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels