Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In AsterixDB's logical query plan, we use a SubplanOperator which contains nested logical plans to represent subqueries. The rule InlineSubplanInputForNestedTupleSourceRule is to remove SubplanOperators containing DataScan, InnerJoin, LeftOuterJoin, UnionAll or Distinct. Given a qualified Subplan operator called S1, Let's call its input operator O1.

 

General Cases

 We We have the following rewritings for general cases:

 R1R1. Replace all NestedTupleSourceOperators in S1 with deep-copies (with new variables) of the query plan rooted at O1;

 R2R2. Add a LeftOuterOperatorJoinOperator (let's call it LJ) between O1 and the SubplanOperator's       root s root operator's input (let's call it SO1), where O1 is the left branch and SO1 is the right       branchright branch;

R3. The deep copy of the primary key variables in O1 should be preserved from an inlined       inlined NestedTupleSourceOperator to SO1. The join condition of LJ is the equality between       the between the primary key variables in O1 and its deep copied version at SO1;

...

R5. On top of the LJ, add a GroupByOperaptor in which the nested plan consists of the       S1the S1's root operator, i.e., an aggregate operator. Below the aggregate, there is a not-null-filter       on filter on variable v. The group key is the primary key variables in O1.

 

This is an abstract example for the rewriting mechanism described above: 

Before rewriting:

--Op1

  --Subplan{

    --AggregateOp

      --NestedOp

        .....

          --Nested-Tuple-Source

    }

    --InputOp

      .....

 

After rewriting:

--Op1

  --GroupBy v_lc_1, ..., v_lc_n Decor v_l1, ....v_ln {

            --AggregateOp

              --Select v_new!=NULL

                -- Nested-Tuple-Source

          }

     --LeftOuterJoin (v_lc_1=v_rc_1 AND .... AND v_lc_n=v_rc_n)

       (left branch)

         --InputOp

            .....

       (right branch)

         -- Assign v_new=TRUE

           --NestedOp

             .....

               --Deepcopy_The_Plan_Rooted_At_InputOp_With_New_Variables(InputOp)

 

In the plan, v_lc_1, ..., v_lc_n are live "covering" variables at InputOp,while  while v_rc_1, ..., v_rc_n are their corresponding variables populated from the deepcopy of InputOp.( "Covering" variables form a set of variables that can imply all live variables.)v v_l1, ....v_ln in the decoration part of the added group-by operator are alllive all live variables at InputOp except the covering variables v_lc_1, ..., v_lc_n.

 

Here is a concrete example of the general case rewriting. 

 In the current implementation, we use "covering" variables as primary key variables. In the next version, we will use the real primary key variables, which will fix ASTERIXDB-1168.

 

Here is a concrete example of the general case rewriting (optimizerts/queries/nested_loj4.aql). 

Before plan:

distribute result [%0->$$13] -- |UNPARTITIONED|

...

            empty-tuple-source -- |UNPARTITIONED|

 

 After plan:

distribute result [%0->$$13] -- |UNPARTITIONED|

...

a. there is a join (let's call it J1.) in the nested plan,

b. one input pipeline of J1 if J1 is an inner join, one input pipeline of J1 has a NestedTupleSource descendant (let's call it N1),

c. if J1 is a left outer join, the left branch of J1 has a NestedTupleSource descendant (let's call it N1),

d. there is no tuple dropping from the N1 to the J1

 

Rewriting R2 is not necessary since before J1, all tuples from N1 are preserved. But rewriting R1' to R4' But the following rewritings are needed:

R1'. Replace N1 by the O1 (no additional deep copy);

R2'. All inner joins on the path from N1 to J1 (including J1) are rewritten to a become left-outer join joins with the same join conditionconditions;

R3'. If N1 resides in the right branch of a an inner join (let's call it J2) in the path from N1 to J1, switch the left and right branches of J2;

R4'. For every left join from N1 to J1 transformed from an inner join, a variable vi indicating non-match tuples is assigned to TRUE in its right branch;

R5'. On On top of J1, a GroupByOperaptor G1 is added where the group-by key is the primary key of the subplan input operator and O1 and the nested query plan for aggregation is the nested pipeline on top of J1 ( with a an added not-null-filter added)filter to check all vi are not null.

R5R6'. All other NestedTupleSourceOperators in the subplan is inlined with a deep copy of the query plan rooted at O1.

 

deep copies (with new variables) of the query plan rooted at O1.

 

This is an abstract example for the special rewriting mechanism described above: 

Before rewriting:

--Op1

  --Subplan{

    --AggregateOp

      --NestedOp

        – Inner Join (J1)

          – (Right branch) ..... (L1)

          – (Left branch) ..... (R1)

                    --Nested-Tuple-Source

    }

    --InputOp

      .....

(Note that pipeline R1 must satisfy the condition that it does not drop any tuples.)

After rewriting:

-- Op1

  – GroupBy v_lc_1, ..., v_lc_n Decor v_l1, ....v_ln {

            – AggregateOp

               – NestedOp

                 – Select v_new!=NULL

                   – Nested-Tuple-Source

          }

     --LeftOuterJoin (J1)

       (left branch)

              –  ......  (R1)

                 – InputOp

                   .....

       (right branch)

             – Assign v_new=TRUE 

                – ..... (L1)

 

In the plan, v_lc_1, ..., v_lc_n are live "covering" variables at InputOp and v_l1, ....v_ln in the decoration part of the added group-by operator are all live variables at InputOp except the covering variables v_lc_1, ..., v_lc_n.  In the current implementation, we use "covering" variables as primary key variables. In the next version, we will use the real primary key variables, which will fix ASTERIXDB-1168.

 

Here is a concrete example (optimizerts/queries/nested_loj2.aql). .

Before planBefore plan (nested_loj2):

distribute result [%0->$$17] -- |UNPARTITIONED|

...

                        assign [$$22] <- [function-call: asterix:field-access-by-index, Args:[%0->$$1, AInt32: {1}]] -- |UNPARTITIONED|

                          data-scan []<-[$$19, $$1] <- tpch:Orders -- |UNPARTITIONED|

                            empty-tuple-source -- |UNPARTITIONED|

             } -- |UNPARTITIONED|

        data-scan []<-[$$18, $$0] <- tpch:Customers -- |UNPARTITIONED|

          empty-tuple-source -- |UNPARTITIONED|

 

After plan:

distribute result [%0->$$17] -- |UNPARTITIONED|

...

          left outer join (function-call: algebricks:eq, Args:[%0->$$22, %0->$$18]) -- |UNPARTITIONED|

            data-scan []<-[$$18, $$0] <- tpch:Customers -- |UNPARTITIONED|

              empty-tuple-source -- |UNPARTITIONED|

            assign [$$28] <- [TRUE] -- |UNPARTITIONED|

...

                empty-tuple-source -- |UNPARTITIONED|

 

Gerrit patch for this change: 

https://asterix-gerrit.ics.uci.edu/#/c/572

https://asterix-gerrit.ics.uci.edu/#/c/579