...

Code Block

language	sql
title	Extracting partition from IN
linenumbers	true

SELECT * FROM emp WHERE emp.id IN (100, :1)
=> ({name=100}, {name=:1}) => (P1, :1)

Range conditions could be converted to IN statements if column is of integer type.

Code Block

language	sql
title	BETWEEN on affinity column
linenumbers	true

SELECT * FROM emp WHERE emp.id BETWEEN 100 AND 102
=> {id BETWEEN 100 AND 102} => {100, 101, 102} => (P1, P2, P3)

Code Block

language	sql
title	Range on integer affinity column
linenumbers	true

SELECT * FROM emp WHERE emp.id > 100 AND emp.id <= 102
=> {id > 100 AND id <= 102} => {101, 102} => (P1, P2)

Composite expressions (AND, OR)

...

Code Block

language	sql
title	OR algebra
linenumbers	true

(P1) OR (P2) => (P1, P2)
(P1) OR (ALL) => (ALL)
(P1) OR () => (P1)
(P1, P2) OR (P2, P3) => (P1, P2, P3)


(:1) OR (:2) => (:1, :2)
(P1, :1) OR (P2, :2) => (P1, P2, :1, :2)

Ranges on integer types

TODO

AND

TODO

...

JOINs

Query Rewrite

Subqueries

TODO

OR

TODO

Merging Partition Info

AND condition

TODO

OR condition

TODO

...

TODO

Theory

This technique is used excessively by both distributed databases and classical RDBMS vendors (as a part of their partitioning/sharding solutions). First, query is analyzed and optionally re-written to allow for better partition pruning. Second, information of interesting partitions are extracted from query for every table (cache) and subquery (for complex multi-stage distributed queries).

Query Rewrite

WHERE conditions and JOIN-s are relatively simple targets for partition extraction. The main difficulty is subqueries, as they potentially introduce unlimited nesting that complicates analysis a lot. So the main goal of query rewrite is to minimize number of subqueries. Main techniques are JOIN conversion, correlation and materialization. Interestingly, all these techniques not only simplify partition extraction, but also allow optimizer to choose better execution plan for local execution, as JOINs are much easier to be analyzed and reordered than subqueries. As we shall see below, efforts are usually focused on non-correlated subqueries as they are the most costly for straightforward execution.

...

Page tree

Versions Compared

Old Version 23

New Version 24

Key

Composite expressions (AND, OR)

Ranges on integer types

AND

JOINs

Query Rewrite

Subqueries

OR

Merging Partition Info

AND condition

OR condition

TODO

Theory

Query Rewrite

Page tree

Page History

Versions Compared

Old Version 23

New Version 24

Key

Composite expressions (AND, OR)

Ranges on integer types

AND

JOINs

Query Rewrite

Subqueries

OR

Merging Partition Info

AND condition

OR condition

TODO

Theory

Query Rewrite