You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 18 Next »

In AsterixDB's logical query plan, we use a SubplanOperator which contains nested logical plans to represent subqueries. The rule InlineSubplanInputForNestedTupleSourceRule is to remove SubplanOperators containing DataScan, InnerJoin, LeftOuterJoin, UnionAll or Distinct. Given a qualified Subplan operator called S1, Let's call its input operator O1.

 

General Cases

We have the following rewritings for general cases:

R1. Replace all NestedTupleSourceOperators in S1 with deep-copies (with new variables) of the query plan rooted at O1;

R2. Add a LeftOuterOperatorJoinOperator (let's call it LJ) between O1 and the SubplanOperator's root operator's input (let's call it SO1), where O1 is the left branch and SO1 is the right branch;

R3. The deep copy of the primary key variables in O1 should be preserved from an inlined NestedTupleSourceOperator to SO1. The join condition of LJ is the equality between the primary key variables in O1 and its deep copied version at SO1;

R4. A variable v indicating non-match tuples is assigned to TRUE between LJ and SO1;

R5. On top of the LJ, add a GroupByOperaptor in which the nested plan consists of the S1's root operator, i.e., an aggregate operator. Below the aggregate, there is a not-null-filter on variable v. The group key is the primary key variables in O1.

 

This is an abstract example for the rewriting mechanism described above: 

Before rewriting:

--Op1

  --Subplan{

    --AggregateOp

      --NestedOp

        .....

          --Nested-Tuple-Source

    }

    --InputOp

      .....

 

After rewriting:

--Op1

  --GroupBy v_lc_1, ..., v_lc_n Decor v_l1, ....v_ln {

            --AggregateOp

              --Select v_new!=NULL

                -- Nested-Tuple-Source

          }

     --LeftOuterJoin (v_lc_1=v_rc_1 AND .... AND v_lc_n=v_rc_n)

       (left branch)

         --InputOp

            .....

       (right branch)

         -- Assign v_new=TRUE

           --NestedOp

             .....

               --Deepcopy_The_Plan_Rooted_At_InputOp_With_New_Variables(InputOp)

 

In the plan, v_lc_1, ..., v_lc_n are live "covering" variables at InputOp, while v_rc_1, ..., v_rc_n are their corresponding variables populated from the deepcopy of InputOp. "Covering" variables form a set of variables that can imply all live variables. v_l1, ....v_ln in the decoration part of the added group-by operator are all live variables at InputOp except the covering variables v_lc_1, ..., v_lc_n.  In the current implementation, we use "covering" variables as primary key variables. In the next version, we will use the real primary key variables, which will fix ASTERIXDB-1168.

 

Here is a concrete example of the general case rewriting (optimizerts/queries/nested_loj4.aql). 

Before plan:

distribute result [%0->$$13] -- |UNPARTITIONED|

  project ([$$13]) -- |UNPARTITIONED|

    assign [$$13] <- [function-call: asterix:open-record-constructor, Args:[AString: {cust}, %0->$$0, AString: {orders}, %0->$$12]] -- |UNPARTITIONED|

      subplan {

                aggregate [$$12] <- [function-call: asterix:listify, Args:[%0->$$1]] -- |UNPARTITIONED|

                  join (function-call: algebricks:eq, Args:[%0->$$16, %0->$$14]) -- |UNPARTITIONED|

                    select (function-call: algebricks:eq, Args:[%0->$$18, AInt64: {5}]) -- |UNPARTITIONED|

                      nested tuple source -- |UNPARTITIONED|

                    assign [$$16] <- [function-call: asterix:field-access-by-name, Args:[%0->$$19, AString: {o_custkey}]] -- |UNPARTITIONED|

                      assign [$$19] <- [function-call: asterix:field-access-by-name, Args:[%0->$$1, AString: {o_$o}]] -- |UNPARTITIONED|

                        data-scan []<-[$$15, $$1] <- tpch:Orders -- |UNPARTITIONED|

                          empty-tuple-source -- |UNPARTITIONED|

             } -- |UNPARTITIONED|

        assign [$$18] <- [function-call: asterix:field-access-by-index, Args:[%0->$$0, AInt32: {3}]] -- |UNPARTITIONED|

          data-scan []<-[$$14, $$0] <- tpch:Customers -- |UNPARTITIONED|

            empty-tuple-source -- |UNPARTITIONED|

 

 After plan:

distribute result [%0->$$13] -- |UNPARTITIONED|

  project ([$$13]) -- |UNPARTITIONED|

    assign [$$13] <- [function-call: asterix:open-record-constructor, Args:[AString: {cust}, %0->$$0, AString: {orders}, %0->$$12]] -- |UNPARTITIONED|

      group by ([$$24 := %0->$$14]) decor ([%0->$$0; %0->$$18]) {

                aggregate [$$12] <- [function-call: asterix:listify, Args:[%0->$$1]] -- |UNPARTITIONED|

                  select (function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$23]]) -- |UNPARTITIONED|

                    nested tuple source -- |UNPARTITIONED|

             } -- |UNPARTITIONED|

        left outer join (function-call: algebricks:eq, Args:[%0->$$14, %0->$$22]) -- |UNPARTITIONED|

          assign [$$18] <- [function-call: asterix:field-access-by-index, Args:[%0->$$0, AInt32: {3}]] -- |UNPARTITIONED|

            data-scan []<-[$$14, $$0] <- tpch:Customers -- |UNPARTITIONED|

              empty-tuple-source -- |UNPARTITIONED|

          assign [$$23] <- [TRUE] -- |UNPARTITIONED|

            join (function-call: algebricks:eq, Args:[%0->$$16, %0->$$22]) -- |UNPARTITIONED|

              select (function-call: algebricks:eq, Args:[%0->$$20, AInt64: {5}]) -- |UNPARTITIONED|

                assign [$$20] <- [function-call: asterix:field-access-by-index, Args:[%0->$$21, AInt32: {3}]] -- |UNPARTITIONED|

                  data-scan []<-[$$22, $$21] <- tpch:Customers -- |UNPARTITIONED|

                    empty-tuple-source -- |UNPARTITIONED|

              assign [$$16] <- [function-call: asterix:field-access-by-name, Args:[%0->$$19, AString: {o_custkey}]] -- |UNPARTITIONED|

                assign [$$19] <- [function-call: asterix:field-access-by-name, Args:[%0->$$1, AString: {o_$o}]] -- |UNPARTITIONED|

                  data-scan []<-[$$15, $$1] <- tpch:Orders -- |UNPARTITIONED|

                    empty-tuple-source -- |UNPARTITIONED|


Special Cases

For special cases where:

a. there is a join (let's call it J1.) in the nested plan,

b. one input pipeline of J1 has a NestedTupleSource descendant (let's call it N1),

c. there is no tuple dropping from the N1 to the J1

 

Rewriting R2 is not necessary since before J1, all tuples from N1 are preserved. But rewriting R1' to R4' are needed:

R1'. Replace N1 by the O1 (no additional deep copy);

R2'. All inner joins on the path from N1 to J1 (including J1) are rewritten to a left-outer join with the same join condition;

R3'. If N1 resides in the right branch of a join (let's call it J2) in the path from N1 to J1, switch the left and right branches of J2;

R4'. On top of J1, a GroupByOperaptor G1 is added where the group-by key is the primary key of O1 and the nested query plan for aggregation is the nested pipeline on top of J1 (with a not-null-filter added).

R5'. All other NestedTupleSourceOperators in the subplan is inlined with the query plan rooted at O1.

 

Here is a concrete example (optimizerts/queries/nested_loj2.aql). .

Before plan:

distribute result [%0->$$17] -- |UNPARTITIONED|

  project ([$$17]) -- |UNPARTITIONED|

    assign [$$17] <- [function-call: asterix:open-record-constructor, Args:[AString: {cust}, %0->$$0, AString: {orders}, %0->$$16]] -- |UNPARTITIONED|

      subplan {

                aggregate [$$16] <- [function-call: asterix:listify, Args:[%0->$$15]] -- |UNPARTITIONED|

                  assign [$$15] <- [function-call: asterix:open-record-constructor, Args:[AString: {order}, %0->$$1, AString: {items}, %0->$$14]] -- |UNPARTITIONED|

                    subplan {

                              aggregate [$$14] <- [function-call: asterix:listify, Args:[%0->$$2]] -- |UNPARTITIONED|

                                join (function-call: algebricks:eq, Args:[%0->$$20, %0->$$19]) -- |UNPARTITIONED|

                                  nested tuple source -- |UNPARTITIONED|

                                  data-scan []<-[$$20, $$21, $$2] <- tpch:LineItems -- |UNPARTITIONED|

                                    empty-tuple-source -- |UNPARTITIONED|

                           } -- |UNPARTITIONED|

                      join (function-call: algebricks:eq, Args:[%0->$$22, %0->$$18]) -- |UNPARTITIONED|

                        nested tuple source -- |UNPARTITIONED|

                        assign [$$22] <- [function-call: asterix:field-access-by-index, Args:[%0->$$1, AInt32: {1}]] -- |UNPARTITIONED|

                          data-scan []<-[$$19, $$1] <- tpch:Orders -- |UNPARTITIONED|

                            empty-tuple-source -- |UNPARTITIONED|

             } -- |UNPARTITIONED|

        data-scan []<-[$$18, $$0] <- tpch:Customers -- |UNPARTITIONED|

          empty-tuple-source -- |UNPARTITIONED|

 

After plan:

distribute result [%0->$$17] -- |UNPARTITIONED|

  project ([$$17]) -- |UNPARTITIONED|

    assign [$$17] <- [function-call: asterix:open-record-constructor, Args:[AString: {cust}, %0->$$0, AString: {orders}, %0->$$16]] -- |UNPARTITIONED|

      group by ([$$30 := %0->$$18]) decor ([%0->$$0]) {

                aggregate [$$16] <- [function-call: asterix:listify, Args:[%0->$$15]] -- |UNPARTITIONED|

                  assign [$$15] <- [function-call: asterix:open-record-constructor, Args:[AString: {order}, %0->$$1, AString: {items}, %0->$$14]] -- |UNPARTITIONED|

                    group by ([$$27 := %0->$$19]) decor ([%0->$$0; %0->$$1; %0->$$18; %0->$$22]) {

                              aggregate [$$14] <- [function-call: asterix:listify, Args:[%0->$$2]] -- |UNPARTITIONED|

                                select (function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$26]]) -- |UNPARTITIONED|

                                  nested tuple source -- |UNPARTITIONED|

                           } -- |UNPARTITIONED|

                      select (function-call: algebricks:and, Args:[function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$28]], function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$29]]]) -- |UNPARTITIONED|

                        nested tuple source -- |UNPARTITIONED|

             } -- |UNPARTITIONED|

        left outer join (function-call: algebricks:eq, Args:[%0->$$20, %0->$$19]) -- |UNPARTITIONED|

          left outer join (function-call: algebricks:eq, Args:[%0->$$22, %0->$$18]) -- |UNPARTITIONED|

            data-scan []<-[$$18, $$0] <- tpch:Customers -- |UNPARTITIONED|

              empty-tuple-source -- |UNPARTITIONED|

            assign [$$28] <- [TRUE] -- |UNPARTITIONED|

              assign [$$22] <- [function-call: asterix:field-access-by-index, Args:[%0->$$1, AInt32: {1}]] -- |UNPARTITIONED|

                data-scan []<-[$$19, $$1] <- tpch:Orders -- |UNPARTITIONED|

                  empty-tuple-source -- |UNPARTITIONED|

          assign [$$29] <- [TRUE] -- |UNPARTITIONED|

            assign [$$26] <- [TRUE] -- |UNPARTITIONED|

              data-scan []<-[$$20, $$21, $$2] <- tpch:LineItems -- |UNPARTITIONED|

                empty-tuple-source -- |UNPARTITIONED|

 

Gerrit patch for this change: 

https://asterix-gerrit.ics.uci.edu/#/c/572

https://asterix-gerrit.ics.uci.edu/#/c/579

 

  • No labels