...
Partition pruning is already implemented in Apache Ignite in very simplified form [1]. Only WHERE condition with equality is considered and only for SQL queries without joins. We should expand it further.
[1] https://issues.apache.org/jira/browse/IGNITE-4509
...
In the following sections we first explain how partitions could be extracted from SQL parts, and how certain query rewrite techniques could help us with it. Then we will describe how extracted partition info is assembled in a form of tree. Then we discuss that partition extraction should be performed two times - before split for the whole query, and after split for query parts. Finally, we explain how partition info will be passed to thin clients, and how users will be able to control and monitor partition pruning.
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
SELECT * FROM emp WHERE emp.name = :1 => {name=:1} => (ALL) |
IN condition with list of values results in a merged list of affected partitions. IN condition with nested SELECT statement will not be supported for now.
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
(P1) OR (P2) => (P1, P2)
(P1) OR (ALL) => (ALL)
(P1) OR () => (P1)
(P1, P2) OR (P2, P3) => (P1, P2, P3)
(:1) OR (:2) => (:1, :2)
(P1, :1) OR (P2, :2) => (P1, P2, :1, :2) |
TODO
TODO
This technique is used excessively by both distributed databases and classical RDBMS vendors (as a part of their partitioning/sharding solutions). First, query is analyzed and optionally re-written to allow for better partition pruning. Second, information of interesting partitions are extracted from query for every table (cache) and subquery (for complex multi-stage distributed queries).
WHERE conditions and JOIN-s are relatively simple targets for partition extraction. The main difficulty is subqueries, as they potentially introduce unlimited nesting that complicates analysis a lot. So the main goal of query rewrite is to minimize number of subqueries. Main techniques are JOIN conversion, correlation and materialization. Interestingly, all these techniques not only simplify partition extraction, but also allow optimizer to choose better execution plan for local execution, as JOINs are much easier to be analyzed and reordered than subqueries. As we shall see below, efforts are usually focused on non-correlated subqueries as they are the most costly for straightforward execution.
The goal is to convert a subquery to join if possible. Important prerequisite is that number of resulting rows is not changed. Outer(FK) → inner(PK) joins are possible. E.g. subqueries in SELECT (aka "derived table") and FROM clauses.
Example 1: JOIN conversion for derived table
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
SELECT emp.name, (SELECT dept.name FROM dept WHERE emp.dept_id=dept.id)
FROM emp
WHERE emp.salary>1000 |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
SELECT emp.name, dept.name
FROM emp, dept
WHERE emp.salary>1000 AND emp.dept_id=dept.id |
Example 2: JOIN conversion for FROM clause
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
SELECT emp.name, dept_subquery.name
FROM emp, (SELECT * FROM dept WHERE state='CA') dept_subquery
WHERE emp.salary>1000 AND emp.dept_id=dept_subquery.id |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
SELECT emp.name, dept.name
FROM emp, dept
WHERE emp.salary>1000 AND emp.dept_id=dept.id AND dept.state='CA' |
Also, it is possible for some IN-clauses. MariaDB calls it "table pullout optimization" [1]
Joins are very common, so it is crucial to support partition extraction for them as well. General solution might be extremely complex, so we need to define reasonable bounds where optimization is applicable, and improve them iteratively in future. We start with query AST obtained from parser. Proposed flow to extract partitions is explained below. Some of these steps could be merged to improve performance.
REPLICATED
and do not have node filter, then mark it as "ANY" and remove from the tree, as it doesn't affect JOIN outcome. Otherwise - exit, no need to bother with custom filters.RendezvousAffintiyFunction
is deterministic, while FairAffinityFunction
is not)PARTITIONED
caches, only equi-joins, where every table is assigned to a single co-location group.REPLICATED
caches with "ANY" policy should be eliminated as follows:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
(P1, :2) AND (ANY) => (P1, :2)
(P1, :2) OR (ANY) => (P1, :2) |
If partition tree contain rules from different co-location groups, then exit.
It is not easy to extract partitions from subqueries. But we can rewrite certain subqueries to joins with a technique called "join conversion". Important prerequisite is that number of resulting rows is not changed.
Example 1: JOIN conversion for derived tableExample 3: Table pullout optimization
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
SELECT emp.name FROM emp WHERE emp.salary>1000 AND, (SELECT dept.name FROM dept WHERE emp.dept_id IN (SELECT id FROM dept WHERE state='CA')=dept.id) FROM emp WHERE emp.salary>1000 |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
SELECT emp.name, dept.name FROM emp, dept WHERE emp.salary>1000 AND emp.dept_id=dept.id AND dept.state='CA' |
Another example where inner subquery return at most one row is outer(PK) → inner(FK GROUP BY PK) case. Subquery is not eliminated in this case, but IN expression is replaced with more comfortable JOIN.
Example 4: Replacing IN with JOIN
Example 2: JOIN conversion for FROM clause
Code Blockcode | ||||||
---|---|---|---|---|---|---|
| ||||||
SELECT emp.name, dept_subquery.name FROM dept WHEREemp, (SELECT * FROM dept.id IN (SELECT emp.dept_id FROM emp GROUP BYWHERE state='CA') dept_subquery WHERE emp.salary>1000 AND emp.dept_id HAVING COUNT(*)>10)=dept_subquery.id |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
SELECT emp.name, dept.name FROM emp, dept WHERE JOIN (SELECTemp.salary>1000 AND emp.dept_id FROM emp GROUP BY emp.dept_id HAVING COUNT(*)>10) emp_subquery ON dept.id=emp_subquery.dept_id |
Uncorrelated subquery may need to scan huge result set. If underlying cache is PARTITIONED
, then multiple nodes need to be scanned. If underlying cache is REPLICATED
, then scan is local. If previous optimizations are not applicable, we can try to push-down condition from outer table to subquery. If condition is equality and both outer and inner columns are co-located, then potentially distributed query is reduced to co-located query. The downside is that subquery must be evaluated multiple times, once for every tuple from outer relation. This might be less than optimal comparing to subquery materialization explained below. MySQL refer to this as IN-to-EXISTS optimization [2].
...
=dept.id AND dept.state='CA' |
Also, it is possible for some IN-clauses. MariaDB calls it "table pullout optimization" [1]
Example 3: Table pullout optimization
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
SELECT emp.name
FROM emp
WHERE emp.salary>1000 AND emp.dept_id IN (SELECT id FROM dept WHERE state='CA') |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
SELECT emp.name FROM emp, dept WHERE emp.salary>1000 AND emp.dept_id IN (SELECT =dept.id FROMAND dept WHERE .state='CA') |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
SELECT emp.name
FROM emp
WHERE EXISTS (SELECT id FROM dept WHERE dept.id=emp.dept_id AND state='CA') |
If subquery is not correlated, then it can be executed only once. Result is cached in temporary table and then re-used. If expected number of returned records is small, then it makes sense to execute the request before other query parts, and send obtained result to other map nodes. If result is scalar, then entire query is replaced with a single result.
This way some queries which were too complex to extract target partitions from may be reduced to simpler form, where extraction is possible.
[1] https://mariadb.com/kb/en/library/table-pullout-optimization/
[2] https://dev.mysql.com/doc/refman/8.0/en/subquery-optimization-with-exists.html
TODO
[1] https://mariadb.com/kb/en/library/table-pullout-optimization/
[2] https://dev.mysql.com/doc/refman/8.0/en/subquery-optimization-with-exists.html
Partitions could be calculated not only on the server side, but on clients as well. This way thin clients could be able to send requests directly to affected nodes, reducing latency. To achieve this we first define serializable partition tree model:
Code Block | ||
---|---|---|
| ||
interface PartitionNode {
Collection<Integer> apply(Object[] args);
}
class PartitionGroup implements PartitionNode {
Collection<Object> parts; // Concrete partitions, arguments or both.
}
class PartitionExpression implements PartitionNode {
PartitionNode left;
PartitionNode right;
} |
Partition tree is enriched with AffinityTopologyVersion
it was built on, and affinity function descriptor. Descriptor can only be defined for well-known affinity functions, such as RendezvousAffinityFunction
.
Code Block | ||
---|---|---|
| ||
class PartitionInfo {
PartitionNode tree;
AffintiyTopologyVersion affTopVer;
AffinityFunctionDecriptor affFunc;
} |
When client executes request for the first time, the following sequence of actions happen:
Client saves partition tree and re-use it for further requests as follows:
If partition tree is extracted form the query successfully, then two types of optimizations are possible:
It is very important to let user know if partition pruning is applicable to query for performance tuning. For every cached two-step query we may expose the following information:
Also we need to let user disable this optimization. Otherwise a bug in implementation may lead to incorrect results with no workarounds. System on configuration property could be used for that.TODO
Jira | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|