Currently, AsterixDB code base has 98 rules total. 83 rules are in use by Asterix and are grouped in collections.

We have 12 collections and the role of each collection is detailed below. Additionally, there are 6 rules for Hivesterix, 16 rules for Vxquery, 5 rules are abstract rules or extensions and 4 rules are currently not used.

These are our collections (1-9 logical rules, 10-12 physical rules):

  1. TypeInference (3) - validation of inputs types and filter conditions + unnesting.

  2. Normalization (17) - extractions, simplification, functions and operators transformations

  3. CondPushDownAndJoinInference (25) - rearrangement of operators order and transformations

  4. LoadFields (9) - functions and field access improvements.

  5. FuzzyJoin (2) - InferTypes + just one rule related to FuzzyJoin (currently disabled)

  6. Consolidation (9) - Minimization of the plan

  7. AccessMethod (6) - Access operators introduce indexes, join removed

  8. PlanCleanup (6) - most were already fired + adding early projects

  9. DataExchange (1) - prior to physical plan, conversions from local to unpartitioned?

  10. PhysicalRewritesAllLevel (13) - physical transformations

  11. PhysicalRewritesTopLevel (6) - optimization of the physical plan

  12. PrepareForJobGen (4) - Adds one to one exchange and rearranges.


We are calling all our collection rules, in a slightly different order than the above:

  1. TypeInference

  2. Normalization

  3. CondPushDownAndJoinInference

  4. LoadFields

  5. Fuzzy

  6. Normalization - repeated

  7. CondPushDownAndJoinInference - repeated

  8. LoadFields - repeated

  9. DataExchange

  10. PhysicalRewritesAllLevel

  11. PhysicalRewritesTopLevel

  12. PrepareForJobGen

  13. PhysicalRewritesAllLevel

  14. PhysicalRewritesTopLevel

  15. PrepareForJobGen


Abstract rules:

AbstractDecorrelationRule implemented by IntroJoinInsideSubplanRule

AbstractExtractExprRule implemented ExtractDistinctByExpressionsRule,  ExtractGbyExpressionsRule, ExtractOrderExpressionsRule

AbstractIntroduceAccessMethodRule implemented by IntroduceJoinAccessMethodRule, IntroduceSelectAccessMethodRule

AbstractIntroduceCombinerRule implemented by IntroduceAggregateCombinerRule, IntroduceGroupByCombinerRule

InlineVariablesRule - AsterixInlineVariablesRule actually extends this rule


Rules used by Hive or Piglet:

InsertProjectBeforeWriteRule - used by Hive

IntroduceEarlyProjectRule - used by Hive

LocalGroupByRule - used by Hive

PushProjectIntoDataSourceScanRule - used by Hive and Piglet

RemoveRedundantProjectionRule - used by Hive

RemoveRedundantSelectRule - used by Hive


Cleanup suggestations:

Rules which are not in use:

ByNameToByHandleFieldAccessRule - TO BE REMOVED

FuzzyJoinRule - TO BE RE-ENABLED

IntroduceTransactionCommitByAssignOpRule - TO BE REMOVED

PullPositionalVariableFromUnnestRule --?? WHAT DOES IT DO


Rules which should be renamed to *Rule (add the suffix Rule):

EnforceOrderByAfterSubplan

PushProperJoinThroughProduct

PushGroupByThroughProduct

FeedScanCollectionToUnnest

RemoveRedundantGroupByDecorVars

PushSimilarityFunctionsBelowJoin

PullSelectOutOfEqJoin

IntroHashPartitionMergeExchange

PushFunctionsBelowJoin  -extended by PushSimilarityFunctionsBelowJoin)



Types of rules:

SequentialFixpointRuleController(false) = you don’t do a DFS and apply the rule to the operator only, however you reiterate on the group of rules  until you  get no change

 SequentialFixpointRuleController(true) = you do a DFS

 SequentialOnceRuleController(true) = you apply the rule once


There are 60 beyond-compare sessions ready to show the plan before and after the rule was applied. Note that not all rules produce an apparent change in the plan.

I am figuring out a way to expose them on this wiki in a clean way - WIP

How does each rule rewrite the logical operator

The general logic is written in AbstractRuleController.java. Each rule visits the operator graph in a deep first order. The rule.rewirtePre(operator) gets called before rewriting the descendants. After rewriting all the descendants, the rule.rewritePost(operator) gets called.

To implement a new rule, we need to implement rewritePre() or rewritePost() depends on whether this rule needs to rewrite the parent first or the descendant first. We haven't seen any rule that implements both interfaces.


  • No labels