Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In AsterixDB's logical query plan, we use a SubplanOperator which contains nested logical plans to represent subqueries. The rule InlineSubplanInputForNestedTupleSourceRule is to remove SubplanOperators containing DataScan, InnerJoin, LeftOuterJoin, UnionAll or Distinct. Given a qualified Subplan operator called S1, Let's call its input operator O1. We

 

General Cases

We have the following rewritings for general cases:

 R1R1. Replace all NestedTupleSourceOperators in S1 with deep-copies (with new variables) of the query plan rooted at O1;

 R2R2. Add a LeftOuterOperatorJoinOperator (let's call it LJ) between O1 and the SubplanOperator's       root s root operator's input (let's call it SO1), where O1 is the left branch and SO1 is the right       branchright branch;

R3. The deep copy of the primary key variables in O1 should be preserved from an inlined       NestedTupleSourceOperator to inlined NestedTupleSourceOperator to SO1. The join condition of LJ is the equality between       the between the primary key variables in O1 and its deep copied version at SO1;

...

R5. On top of the LJ, add a GroupByOperaptor in which the nested plan consists of the       S1the S1's root operator, i.e., an aggregate operator. Below the aggregate, there is a not-null-filter       on filter on variable v. The group key is the primary key variables in O1.

 

This is an abstract example for this rulethe rewriting mechanism described above

Before rewriting:

--Op1

  --Subplan{

    --AggregateOp

      --NestedOp

        .....

          --Nested-Tuple-Source

    }

    --InputOp

      .....

 

After rewriting:

--Op1

  --GroupBy v_lc_1, ..., v_lc_n Decor v_l1, ....v_ln {

            --AggregateOp

              --Select v_new!=NULL

                -- Nested-Tuple-Source

          }

     --LeftOuterJoin (v_lc_1=v_rc_1 AND .... AND v_lc_n=v_rc_n)

       (left branch)

         --InputOp

            .....

       (right branch)

         -- Assign v_new=TRUE

           --NestedOp

             .....

               --Deepcopy_The_Plan_Rooted_At_InputOp_With_New_Variables(InputOp)

 

In the plan, v_lc_1, ..., v_lc_n are live "covering" variables at InputOp,while  while v_rc_1, ..., v_rc_n are their corresponding variables populated from the deepcopy of InputOp.( "Covering" variables form a set of variables that can imply all live variables.)v v_l1, ....v_ln in the decoration part of the added group-by operator are alllive all live variables at InputOp except the covering variables v_lc_1, ..., v_lc_n.  In the current implementation, we use "covering" variables as primary key variables. In the next version, we will use the real primary key variables, which will fix ASTERIXDB-1168.

 

Here is a concrete example of the general case rewriting (optimizerts/queries/nested_loj4.aql)

Before plan:

distribute result [%0->$$27>$$13] -- |UNPARTITIONED|

  project ([$$27$$13]) -- |UNPARTITIONED|

    assign [$$27$$13] <- [function-call: asterix:open-record-constructor,   Args:[AString: {subscription-idcust}, %0->$$35>$$0, AString: {execution-time}, function-call: asterix:current-datetimeArgs:[], AString: {result}, %0->$$6orders}, %0->$$12]] -- |UNPARTITIONED|

      subplan {

                aggregate [$$12]   unnest $$6 <- [function-call: asterix:scan-collectionlistify,   Args:[%0->$$26>$$1]] -- |UNPARTITIONED|

        subplan   {                   aggregate [$$26] <- [join (function-call: asterixalgebricks:listifyeq,   Args:[%0->$$22]] >$$16, %0->$$14]) -- |UNPARTITIONED|

                    join (TRUEselect (function-call: algebricks:eq, Args:[%0->$$18, AInt64: {5}]) -- |UNPARTITIONED|

                      select (%0->$$21) -nested tuple source -- |UNPARTITIONED|

                        subplan {                                  aggregate [$$21assign [$$16] <- [function-call: asterix:nonfield-access-emptyby-streamname,   Args:[%0->$$19, AString: {o_custkey}]] -- |UNPARTITIONED|

                                    join (assign [$$19] <- [function-call: algebricks:eq, asterix:field-access-by-name, Args:[%0->$$34, %0->$$7]) >$$1, AString: {o_$o}]] -- |UNPARTITIONED|

                        data-scan []<-[$$15, $$1] <- tpch:Orders -- |UNPARTITIONED|

                        nested   empty-tuple-source -- |UNPARTITIONED|

             } -- |UNPARTITIONED|

                            assign [$$34$$18] <- [function-call: asterix:field-access-by-index,   Args:[%0->$$10>$$0, AInt32: {13}]] -- |UNPARTITIONED|

                                        data-scan []<-[$$32$$14, $$10$$0] <- emergencyTesttpch:userLocations Customers -- |UNPARTITIONED|

                                          empty-tuple-source -- |UNPARTITIONED|                               }

 

 After plan:

distribute result [%0->$$13] -- |UNPARTITIONED|

                          data-scan []<-[$$31, $$8] <- emergencyTest:CHPReports -- project ([$$13]) -- |UNPARTITIONED|

                            nested tuple source -- |UNPARTITIONED|                      assign [$$22assign [$$13] <- [function-call: asterix:open-record-constructor,   Args:[AString: {shelter locationscust}, %0->$$0, AString: {orders}, %0->$$25>$$12]] -- |UNPARTITIONED|

      group by ([$$24 := %0->$$14]) decor ([%0->$$0; %0->$$18]) {

                          aggregate [$$25$$12] <- [function-call: asterix:listify,   Args:[%0->$$24>$$1]] -- |UNPARTITIONED|

                          assign [$$24] <- select (function-call: algebricks:not, Args:[function-call: asterixalgebricks:field-access-by-index, is-null, Args:[%0->$$11, AInt32: {1}>$$23]]) -- |UNPARTITIONED|

                            data-scan []<-[$$33, $$11] <- emergencyTest:tornadoShelters -- |UNPARTITIONED|                              empty-tuple-nested tuple source -- |UNPARTITIONED|

               } -- |UNPARTITIONED|

          assign [$$7] <- [left outer join (function-call: asterix:field-access-by-index, algebricks:eq, Args:[%0->$$5, AInt32: {1}]] >$$14, %0->$$22]) -- |UNPARTITIONED|

            assign [$$35$$18] <- [function-call: asterix:field-access-by-nameindex,   Args:[%0->$$5>$$0, AStringAInt32: {subscription-id3}]] -- |UNPARTITIONED|

              data-scan []<-[$$30$$14, $$5$$0] <- emergencyTesttpch:NearbySheltersDuringTornadoDangerChannelSubscriptions Customers -- |UNPARTITIONED|

                empty-tuple-source -- |UNPARTITIONED|

 

...

After plan:

distribute result [%0->$$27        assign [$$23] <- [TRUE] -- |UNPARTITIONED|

  project ([$$27            join (function-call: algebricks:eq, Args:[%0->$$16, %0->$$22]) -- |UNPARTITIONED|

        assign [$$27] <- [      select (function-call: asterix:open-record-constructor, algebricks:eq, Args:[AString%0->$$20, AInt64: {subscription-id}, %0->$$35, AString: {execution-time}, 5}]) -- |UNPARTITIONED|

                assign [$$20] <- [function-call: asterix:current-datetimefield-access-by-index, Args:[]%0->$$21, AStringAInt32: {result3}, %0->$$6]] -- |UNPARTITIONED|

      unnest $$6 <- function-call: asterix:scan-collection, Args:[%0->$$26]               data-scan []<-[$$22, $$21] <- tpch:Customers -- |UNPARTITIONED|

          group by ([$$44 := %0->$$30]) decor ([%0->$$35; %0->$$5; %0->$$7]) {          empty-tuple-source -- |UNPARTITIONED|

                  aggregate assign [$$26$$16] <- [function-call: asterix:listify, field-access-by-name, Args:[%0->$$22>$$19, AString: {o_custkey}]] -- |UNPARTITIONED|

                    select (assign [$$19] <- [function-call: algebricks:not, asterix:field-access-by-name, Args:[function-call: algebricks:is-null, Args:[%0->$$43]]) %0->$$1, AString: {o_$o}]] -- |UNPARTITIONED|

                      nested tuple source data-scan []<-[$$15, $$1] <- tpch:Orders -- |UNPARTITIONED|

                       } empty-tuple-source -- |UNPARTITIONED|

          left outer join (function-call: algebricks:eq, Args:[%0->$$30, %0->$$42]) -- |UNPARTITIONED|

            assign [$$7] <- [function-call: asterix:field-access-by-index, Args:[%0->$$5, AInt32: {1}]] -- |UNPARTITIONED|

              assign [$$35] <- [function-call: asterix:field-access-by-name, Args:[%0->$$5, AString: {subscription-id}]] -- |UNPARTITIONED|

                data-scan []<-[$$30, $$5] <- emergencyTest:NearbySheltersDuringTornadoDangerChannelSubscriptions -- |UNPARTITIONED|

                  empty-tuple-source -- |UNPARTITIONED|

            assign [$$43] <- [TRUE] -- |UNPARTITIONED|

              join (TRUE) -- |UNPARTITIONED|

                select (%0->$$21) -- |UNPARTITIONED|

                  group by ([$$37 := %0->$$31; $$42 := %0->$$41]) decor ([%0->$$40; %0->$$39; %0->$$38; %0->$$8]) {

                            aggregate [$$21] <- [function-call: asterix:non-empty-stream, Args:[]] -- |UNPARTITIONED|

                              select (function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$36]]) -- |UNPARTITIONED|

                                nested tuple source -- |UNPARTITIONED|

                         } -- |UNPARTITIONED|

                    left outer join (function-call: algebricks:eq, Args:[%0->$$34, %0->$$38]) -- |UNPARTITIONED|

                      data-scan []<-[$$31, $$8] <- emergencyTest:CHPReports -- |UNPARTITIONED|


Special Cases

For special cases where:

a. there is a join (let's call it J1.) in the nested plan,

b. if J1 is an inner join, one input pipeline of J1 has a NestedTupleSource descendant (let's call it N1),

c. if J1 is a left outer join, the left branch of J1 has a NestedTupleSource descendant (let's call it N1),

d. there is no tuple dropping from N1 to J1

 

Rewriting R2 is not necessary since before J1, all tuples from N1 are preserved. But the following rewritings are needed:

R1'. Replace N1 by the O1 (no additional deep copy);

R2'. All inner joins on the path from N1 to J1 (including J1) become left-outer joins with the same join conditions;

R3'. If N1 resides in the right branch of an inner join (let's call it J2) in the path from N1 to J1, switch the left and right branches of J2;

R4'. For every left join from N1 to J1 transformed from an inner join, a variable vi indicating non-match tuples is assigned to TRUE in its right branch;

R5'. On top of J1, a GroupByOperaptor G1 is added where the group-by key is the primary key of O1 and the nested query plan for aggregation is the nested pipeline on top of J1 with an added not-null-filter to check all vi are not null.

R6'. All other NestedTupleSourceOperators in the subplan is inlined with deep copies (with new variables) of the query plan rooted at O1.

 

This is an abstract example for the special rewriting mechanism described above: 

Before rewriting:

--Op1

  --Subplan{

    --AggregateOp

      --NestedOp

        – Inner Join (J1)

          – (Right branch) ..... (L1)

          – (Left branch) ..... (R1)

                    --Nested-Tuple-Source

    }

    --InputOp

      .....

(Note that pipeline R1 must satisfy the condition that it does not drop any tuples.)

After rewriting:

-- Op1

  – GroupBy v_lc_1, ..., v_lc_n Decor v_l1, ....v_ln {

            – AggregateOp

               – NestedOp

                 – Select v_new!=NULL

                        assign [$$38] <- [function-call: asterix:field-access-by-index, Args:[%0->$$39, AInt32: {1}]] -- |UNPARTITIONED|     – Nested-Tuple-Source

          }

     --LeftOuterJoin (J1)

       (left branch)

              –  ......  (R1)

          assign [$$40] <- [function-call: asterix:field-access-by-name,   Args:[%0->$$39, AString: {subscription-id}]] -- |UNPARTITIONED|      – InputOp

                        data-scan []<-[$$41, $$39] <- emergencyTest:NearbySheltersDuringTornadoDangerChannelSubscriptions -- |UNPARTITIONED|.....

       (right branch)

                              empty-tuple-source -- |UNPARTITIONED|– Assign v_new=TRUE 

                      assign [$$36] <- [TRUE] -- |UNPARTITIONED|

                        assign [$$34] <- [function-call: asterix:field-access-by-index, Args:[%0->$$10, AInt32: {1}]] -- |UNPARTITIONED|

                          data-scan []<-[$$32, $$10] <- emergencyTest:userLocations -- |UNPARTITIONED|

                            empty-tuple-source -- |UNPARTITIONED|

– ..... (L1)

 

In the plan, v_lc_1, ..., v_lc_n are live "covering" variables at InputOp and v_l1, ....v_ln in the decoration part of the added group-by operator are all live variables at InputOp except the covering variables v_lc_1, ..., v_lc_n.  In the current implementation, we use "covering" variables as primary key variables. In the next version, we will use the real primary key variables, which will fix ASTERIXDB-1168.

 

Here is a concrete example (optimizerts/queries/nested_loj2.aql). .

Before plan:

distribute result [%0->$$17] -- |UNPARTITIONED|

  project ([$$17]) -- |UNPARTITIONED|

    assign [$$17                assign [$$22] <- [function-call: asterix:open-record-constructor,   Args:[AString: {shelter locationscust}, %0->$$25>$$0, AString: {orders}, %0->$$16]] -- |UNPARTITIONED|

      subplan {

                aggregate [$$25$$16] <- [function-call: asterix:listify,   Args:[%0->$$24>$$15]] -- |UNPARTITIONED| 

                  assign [$$24$$15] <- [function-call: asterix:fieldopen-accessrecord-by-indexconstructor,   Args:[AString: {order}, %0->$$11>$$1, AInt32AString: {1items}, %0->$$14]] -- |UNPARTITIONED|

                    subplan {

          data-scan []                    aggregate [$$14] <- [$$33, $$11] <- emergencyTest:tornadoShelters -function-call: asterix:listify, Args:[%0->$$2]] -- |UNPARTITIONED|

                                empty-tuple-source join (function-call: algebricks:eq, Args:[%0->$$20, %0->$$19]) -- |UNPARTITIONED|

 

 

For special cases where:

a. there is a join (let's call it J1.) in the nested plan,

b. one input pipeline of J1 has a NestedTupleSource descendant (let's call it N1),

c. there is no tuple dropping from the N1 to the J1

 

Rewriting R2 is not necessary since before J1, all tuples from N1 are preserved. But rewriting R1' to R4' are needed:

R1'. Replace N1 by the O1 (no additional deep copy);

R2'. All inner joins on the path from N1 to J1 (including J1) are rewritten to a left-outer join with the same join condition;

R3'. If N1 resides in the right branch of a join (let's call it J2) in the path from N1 to J1, switch the left and right branches of J2;

R4'. On top of J1, a GroupByOperaptor G1 is added where the group-by key is the primary key of the subplan input operator and the nested query plan for aggregation is the nested pipeline on top of J1 (with a not-null-filter added).

R5'. All other NestedTupleSourceOperators in the subplan is inlined with a deep copy of the query plan rooted at O1.

 

                                nested tuple source -- |UNPARTITIONED|

                                  data-scan []<-[$$20, $$21, $$2] <- tpch:LineItems -- |UNPARTITIONED|

                                    empty-tuple-source -- |UNPARTITIONED|

                           } -- |UNPARTITIONED|

                      join (function-call: algebricks:eq, Args:[%0->$$22, %0->$$18]) -- |UNPARTITIONED|

                        nested tuple source -- |UNPARTITIONED|

                        assign [$$22] <- [function-call: asterix:field-access-by-index, Args:[%0->$$1, AInt32: {1}]] -- |UNPARTITIONED|

                          data-scan []<-[$$19, $$1] <- tpch:Orders -- |UNPARTITIONED|

                            empty-tuple-source -- |UNPARTITIONED|

             } -- |UNPARTITIONED|

        data-scan []<-[$$18, $$0] <- tpch:Customers -- |UNPARTITIONED|

          empty-tuple-source -- |UNPARTITIONED|

 

After plan:

distribute result [%0->$$17] -- |UNPARTITIONED|

  project ([$$17]) -- |UNPARTITIONED|

    assign [$$17] <- [function-call: asterix:open-record-constructor, Args:[AString: {cust}, %0->$$0, AString: {orders}, %0->$$16]] -- |UNPARTITIONED|

      group by ([$$30 := %0->$$18]) decor ([%0->$$0]) {

                aggregate [$$16] <- [function-call: asterix:listify, Args:[%0->$$15]] -- |UNPARTITIONED|

                  assign [$$15] <- [function-call: asterix:open-record-constructor, Args:[AString: {order}, %0->$$1, AString: {items}, %0->$$14]] -- |UNPARTITIONED|

                    group by ([$$27 := %0->$$19]) decor ([%0->$$0; %0->$$1; %0->$$18; %0->$$22]) {

                              aggregate [$$14] <- [function-call: asterix:listify, Args:[%0->$$2]] -- |UNPARTITIONED|

                                select (function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$26]]) -- |UNPARTITIONED|

                                  nested tuple source -- |UNPARTITIONED|

                           } -- |UNPARTITIONED|

                      select (function-call: algebricks:and, Args:[function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$28]], function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$29]]]) -- |UNPARTITIONED|

                        nested tuple source -- |UNPARTITIONED|

             } -- |UNPARTITIONED|

        left outer join (function-call: algebricks:eq, Args:[%0->$$20, %0->$$19]) -- |UNPARTITIONED|

          left outer join (function-call: algebricks:eq, Args:[%0->$$22, %0->$$18]) -- |UNPARTITIONED|

            data-scan []<-[$$18, $$0] <- tpch:Customers -- |UNPARTITIONED|

              empty-tuple-source -- |UNPARTITIONED|

            assign [$$28] <- [TRUE] -- |UNPARTITIONED|

              assign [$$22] <- [function-call: asterix:field-access-by-index, Args:[%0->$$1, AInt32: {1}]] -- |UNPARTITIONED|

                data-scan []<-[$$19, $$1] <- tpch:Orders -- |UNPARTITIONED|

                  empty-tuple-source -- |UNPARTITIONED|

          assign [$$29] <- [TRUE] -- |UNPARTITIONED|

            assign [$$26] <- [TRUE] -- |UNPARTITIONED|

              data-scan []<-[$$20, $$21, $$2] <- tpch:LineItems -- |UNPARTITIONED|

                empty-tuple-source -- |UNPARTITIONED|

 

Gerrit patch for this change: 

https://asterix-gerrit.ics.uci.edu/#/c/572

https://asterix-gerrit.ics.uci.edu/#/c/579