Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Look for non-equality JOIN conditions. When one is found, exit. This way join type space is reduced to equijoins.
  2. Build co-location tree, which is another tree showing how PARTITIONED tables are joined together
    1. Copy current JOIN AST into separate tree
    2. If table is REPLICATED and do not have node filter, then mark it as "ANY" and remove from the tree, as it doesn't affect JOIN outcome. Otherwise - exit, no need to bother with custom filters.
    3. If CROSS JOIN is found, then exit (might be improved in future)
    4. If tables are joined on their affinity columns and has equal affinity functions, then mark them as belonging to the same co-location group. Otherwise - assign them to different co-location groups. Repeat this for all tables and joins in the tree. Functions are defined equal if and only if the following is true:
      1. Affinity function is deterministic (e.g. RendezvousAffintiyFunction is deterministic, while FairAffinityFunction is not)
      2. Both affinity functions are equal
      3. There are no custom node filters
      4. There are no custom affinity key mappers
    5. Every subquery is assigned it's own co-location group unconditionally (may be improved in future)
    6. At this point we have a co-location tree with only PARTITIONED caches, only equi-joins, where every table is assigned a single co-location group.
  3. Extract partitions from expression tree with two additional rules:
    1. Every partition group is assigned respective co-location group from co-location tree
    2. REPLICATED caches with "ANY" policy should be eliminated as follows:

      Code Block
      languagesql
      titleANY algebra
      linenumberstrue
      (P1, :2) AND (ANY) => (P1, :2)
      (P1, :2) OR (ANY) => (P1, :2)


    3. If partition tree contain rules from different co-location groups, then exit.

  4. At this point we have partition tree over a single co-location group. All outstanding arguments could be passed through the same affinity function to get target partitions.

Query Rewrite

TODO

Subqueries

TODO

OR

TODO

Theory

This technique is used excessively by both distributed databases and classical RDBMS vendors (as a part of their partitioning/sharding solutions). First, query is analyzed and optionally re-written to allow for better partition pruning. Second, information of interesting partitions are extracted from query for every table (cache) and subquery (for complex multi-stage distributed queries). 

Query Rewrite

WHERE conditions and JOIN-s are relatively simple targets for partition extraction. The main difficulty is subqueries, as they potentially introduce unlimited nesting that complicates analysis a lot. So the main goal of query rewrite is to minimize number of subqueries. Main techniques are JOIN conversion, correlation and materialization. Interestingly, all these techniques not only simplify partition extraction, but also allow optimizer to choose better execution plan for local execution, as JOINs are much easier to be analyzed and reordered than subqueries. As we shall see below, efforts are usually focused on non-correlated subqueries as they are the most costly for straightforward execution.

JOIN conversion

Subquery rewrite

It is not easy to extract partitions from subqueries. But we can rewrite certain subqueries to joins with a technique called "join conversion". Important The goal is to convert a subquery to join if possible. Important prerequisite is that number of resulting rows is not changed. Outer(FK) → inner(PK) joins are possible. E.g. subqueries in SELECT (aka "derived table") and FROM clauses.

Example 1: JOIN conversion for derived table

...

Code Block
languagesql
titleAfter
linenumberstrue
SELECT emp.name
FROM emp, dept
WHERE emp.salary>1000 AND emp.dept_id=dept.id AND dept.state='CA'

Another example where inner subquery return at most one row is outer(PK) → inner(FK GROUP BY PK) case. Subquery is not eliminated in this case, but IN expression is replaced with more comfortable JOIN.

Example 4: Replacing IN with JOIN

Code Block
languagesql
titleBefore
linenumberstrue
SELECT dept.name
FROM dept
WHERE dept.id IN (SELECT emp.dept_id FROM emp GROUP BY emp.dept_id HAVING COUNT(*)>10)
Code Block
languagesql
titleAfter
linenumberstrue
SELECT dept.name
FROM dept JOIN (SELECT emp.dept_id FROM emp GROUP BY emp.dept_id HAVING COUNT(*)>10) emp_subquery ON dept.id=emp_subquery.dept_id

Correlation

Uncorrelated subquery may need to scan huge result set. If underlying cache is PARTITIONED, then multiple nodes need to be scanned. If underlying cache is REPLICATED, then scan is local. If previous optimizations are not applicable, we can try to push-down condition from outer table to subquery. If condition is equality and both outer and inner columns are co-located, then potentially distributed query is reduced to co-located query. The downside is that subquery must be evaluated multiple times, once for every tuple from outer relation. This might be less than optimal comparing to subquery materialization explained below.  MySQL refer to this as IN-to-EXISTS optimization [2].

Example 5: Converting non-correlated subquery to correlated via condition push-down

Code Block
languagesql
titleBefore
linenumberstrue
SELECT emp.name
FROM emp
WHERE emp.dept_id IN (SELECT id FROM dept WHERE state='CA')
Code Block
languagesql
titleAfter
linenumberstrue
SELECT emp.name
FROM emp
WHERE EXISTS (SELECT id FROM dept WHERE dept.id=emp.dept_id AND state='CA')

Materialization

...

This way some queries which were too complex to extract target partitions from may be reduced to simpler form, where extraction is possible.


[1] https://mariadb.com/kb/en/library/table-pullout-optimization/
[2] https://dev.mysql.com/doc/refman/8.0/en/subquery-optimization-with-exists.html

Partition

...

TODO

...

Pruning on Thin Clients

TODO


Tickets

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
maximumIssues20
jqlQueryproject = Ignite AND labels IN (iep-24) ORDER BY status
serverId5aa69414-a9e9-3523-82ec-879b028fb15b