Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Table of Contents

Motivation

The goal of this IEP is to avoid execution of SQL queries on partitions which do no contain relevant data.

SQL query may scan arbitrary set of cache values. In general case interested values may reside on every cluster node, so broadcast is needed. What is worse, when distributed joins are enabled, another broadcast for every broadcasted message may be needed. One of the most important optimization techniques Widely adopted optimization technique is so-called "partition pruning", allowing to calculate a set of partitions and nodes which will return empty result without actual query execution. Query parts are not executed on these nodes, increasing overall system throughput and providing nearly linear scaling. 

Recent researches in distributed query processing assume that CPU is relatively cheap resource, while network IO and disk IO are equally costly operations. Partition pruning not only saves network requests, but also avoids unnecessary execution of local query parts, preventing page cache trashing and additional disk reads (mostly random). 

...

:

  1. Try extracting information about target partitions from SQL query
  2. If succeeded - execute query only over these partitions

When implemented it will provide the following benefits:

  1. Improved query latency, as we will be able to skip much more partitions than now (only backup partitions are skipped for now)
  2. Improved thin client latency - it will be possible to send requests to target node, thus saving one network hop.
  3. Decreased page cache pressure - less data to read, less data to evict, less number of page locks
  4. Improved system throughput, as less total CPU and IO operations will be required to execute optimized query
  5. Improved thin client latency - it will be possible to send requests to target node, thus saving one network hop.

Partition pruning is already implemented in Apache Ignite in very simplified form [1]. Only WHERE condition with equality is considered and only for SQL queries without joins. We should expand it further.

...