Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This technique is used excessively by both distributed databases and classical RDBMS vendors (as a part of their partitioning/sharding solutions). First, query is analyzed and optionally re-written to allow for better partition pruning. Second, information of interesting partitions are extracted from query for every table (cache) and subquery (for complex multi-stage distributed queries). 

...

WHERE conditions and JOIN-s are relatively simple targets for extract partition extraction. The main difficulty is subqueries, as they potentially introduce unlimited nesting what complicates analysis a lot. So the main goal of query rewrite is to minimize number of subqueries. Main techniques are JOIN conversion, correlation and materialization. Interestingly, all these techniques not only simplify partition extraction, but also allow optimizer to choose better execution plan for local execution, as JOINs are much easier to be analyzed and reordered than subqueries. As we shall see below, efforts are usually focused on non-correlated subqueries as they are the most costly for straightforward execution.

JOIN conversion

...

The goal is to convert a subquery to join if possible. This is possible in most cases for subqueries in SELECT (aka "derived table") and FROM clauses. Also, it is possible for some IN-clauses. 

Example 1: JOIN conversion for derived table

Code Block
languagesql
titleBefore
linenumberstrue
SELECT emp.name, (SELECT dept.name FROM dept WHERE emp.dept_id=dept.id)
FROM emp
WHERE emp.salary>1000


Code Block
languagesql
titleAfter
linenumberstrue
SELECT emp.name, dept.name
FROM emp, dept
WHERE emp.salary>1000 AND emp.dept_id=dept.id

df

Code Block
languagesql
titleBefore
linenumberstrue
SELECT emp.name, dept_subquery.name
FROM emp, (SELECT * FROM dept WHERE state='CA') dept_subquery
WHERE emp.salary>1000 AND emp.dept_id=dept_subquery.id


Code Block
languagesql
titleAfter
linenumberstrue
SELECT emp.name, dept.name
FROM emp, dept
WHERE emp.salary>1000 AND emp.dept_id=dept.id AND dept.state='CA'

Correlation

TODO

Materialization

...