Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The list may be increased.

Implementation details.

Apache Calcite uses graph based representation of relational operators, each node has node specific meta information (join type, projection columns, sort operation direction) and general outgoing data properties (traits in terms of Calcite). Each node may have limited count of trait types (equal to count of registered Trait definitions for a planner). Trait types are defined for whole graph before its construction. Calcite framework transforms nodes (node specific properties), data traits and even nodes position based on initial nodes meta information, data traits and relative position.

Example transformation (filter push down + projection recalculation):

Gliffy Diagram
nameNodes transformation
pagePin3

In scope of the transformation we optimized joined rows count (by putting filter right after scan) and memory usage (cutting unused row columns with additional projection nodes)


In addition to node outgoing data traits, there is a special trait - Convention, it describes an execution approach (how the node produces outgoing data). There are two types of convention: logical - the graph is used for optimization related transformations only, nodes do not produce any data and only show the execution plan, and physical - the graph is used for execution task/tasks building (such graph, for example, may be cast to a cursor type and returned to a user).

Initially Calcite was a planning framework only, it knew nothing about distribution (and was used for non-distributed databases only) and required execution layer implementing. At now it has a number of distribution related rules/converters and may transform an execution graph into an iterator.

As Apache Drill guys did, we are going to introduce

  • two Ignite conventions: Logical and Physical
  • intermediate serializable execution graph representation.
  • a special TraitDef describing distribution traits of a data node.

Ignite logical convention (our own implementations for each type of relational nodes, which cannot be executed itself) is needed for next purposes:

  1. Provide Ignite specific costs system.
  2. Provide methods to build an intermediate graph representation in a visitor style (widely used in Calcite)
  3. Provide methods to traverse an execution graph (it's needed for metadata population/extraction, to do some kind of pre/post processing of an execution tree)

Ignite physical convention is needed to have full control on what happens at execution time (is needed to have memory usage control, intermediate results swapping, execution time optimizations (like code generation, batching), concurrency level control, etc).

Intermediate serializable execution graph representation is needed to send an execution plan to actual executor, since a graph is not serializable, has links to a context and a planner it cannot be serialized without an additional transformation.

Ignite distribution trait definition is needed to make right traits conversion. We need an Ignite specific implementation instead of Calcite abstract one because we need some extra data in distribution trait to calculate target nodes list (remote executors list). Also we need a control on how a particular node distribution calculates, moreover, in Calcite terms HASH distribution isn't the same as a HASH distribution in terms of Ignite. So, most of calcite code may be reused, but some additional Ignite specific logic is needed to make distributions work in a proper way.

The execution flow will look like:

Gliffy Diagram
nameExecution steps
pagePin1

Logical to physical convention transformation example:

Lets assume we build a physical graph for the previously transformed logical one (the graph showed above), also lets assume the left table is replicated and the right one is partitioned, the data is co-located and only two phases is needed, after transformation the execution graph becomes:

Gliffy Diagram
namePhysical plan
pagePin4

Here we see an additional relational operator - Single exchange, it says what everything under the exchange block executes on all data nodes, the result is collected on a client node. The execution is the same as current H2 one.

But having both tables partitioned and non-colocated, the final execution will look like:

Gliffy Diagram
nameThree steps query
pagePin2

Here we see two additional blocks, representing a repartitioning operation. Technically Ignite cannot execute such query now because it requires an additional aggregation step.


In step diagram you may see a physical plan optimization step. Distribution is a physical trait and appears in physical plan, despite the fact the plan is already optimized, we can apply several rules to the plan to optimize exchanges. For example one of tables may have small count of rows and it may be cheaper to send all these rows to nodes, containing partitions of huge table, this way one exchange disappears and another one becomes a Broadcast exchange:

Gliffy Diagram
nameOptimized physical plan
pagePin1

Expected integration steps:

...