Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

a. there is a join (let's call it J1.) in the nested plan,

b. if J1 is an inner join, one input pipeline of J1 has a NestedTupleSource descendant (let's call it N1),

c. if J1 is a left outer join, the left branch of J1 has a NestedTupleSource descendant (let's call it N1),

d. there is no tuple dropping from the N1 to the J1

 

Rewriting R2 is not necessary since before J1, all tuples from N1 are preserved. But rewriting R1' to R4' But the following rewritings are needed:

R1'. Replace N1 by the O1 (no additional deep copy);

R2'. All inner joins on the path from N1 to J1 (including J1) are rewritten to a become left-outer join joins with the same join conditionconditions;

R3'. If N1 resides in the right branch of a an inner join (let's call it J2) in the path from N1 to J1, switch the left and right branches of J2;

R4'. For every left join from N1 to J1 transformed from an inner join, a variable vi indicating non-match tuples is assigned to TRUE in its right branch;

R5'. On top On top of J1, a GroupByOperaptor G1 is added where the group-by key is the primary key of O1 and the nested query plan for aggregation is the nested pipeline on top of J1 ( with a an added not-null-filter added)filter to check all vi are not null.

R5R6'. All other NestedTupleSourceOperators in the subplan is inlined with deep copies (with new variables) of the query plan rooted at O1.

 

This is an abstract example for the special rewriting mechanism described above: 

Before rewriting:

--Op1

  --Subplan{

    --AggregateOp

      --NestedOp

        – Inner Join (J1)

          – (Right branch) ..... (L1)

          – (Left branch) ..... (R1)

                    --Nested-Tuple-Source

    }

    --InputOp

      .....

(Note that pipeline R1 must satisfy the condition that it does not drop any tuples.)

After rewriting:

-- Op1

  – GroupBy v_lc_1, ..., v_lc_n Decor v_l1, ....v_ln {

            – AggregateOp

               – NestedOp

                 – Select v_new!=NULL

                   – Nested-Tuple-Source

          }

     --LeftOuterJoin (J1)

       (left branch)

              –  ......  (R1)

                 – InputOp

                   .....

       (right branch)

             – Assign v_new=TRUE 

                – ..... (L1)

 

In the plan, v_lc_1, ..., v_lc_n are live "covering" variables at InputOp and v_l1, ....v_ln in the decoration part of the added group-by operator are all live variables at InputOp except the covering variables v_lc_1, ..., v_lc_n.  In the current implementation, we use "covering" variables as primary key variables. In the next version, we will use the real primary key variables, which will fix ASTERIXDB-1168.

 

Here is a concrete example (optimizerts/queries/nested_loj2.aql). .

...

                        assign [$$22] <- [function-call: asterix:field-access-by-index, Args:[%0->$$1, AInt32: {1}]] -- |UNPARTITIONED|

                          data-scan []<-[$$19, $$1] <- tpch:Orders -- |UNPARTITIONED|

                            empty-tuple-source -- |UNPARTITIONED|

             } -- |UNPARTITIONED|

        data-scan []<-[$$18, $$0] <- tpch:Customers -- |UNPARTITIONED|

...