Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. The parser will parse a given query, convert to an AST (Abstract Syntax Tree) plan. The optimizer will detect and transform the plan with DPP pattern via planner rules, and get the optimized physical plan. The planner will generate the stream graph based on the physical plan.
  2. Submit the generated job graph to the job manager.
  3. The JM will schedule the job vertices for dim-source and DynamicPartitionCollector first. The filtered data from dim-source operator will be send both to DynamicPartitionCollector operator and the shuffle of Join operator.
  4. DynamicPartitionCollector collects the input data, removes the irrelevant column data and duplicated records, and then sends the partition data (wrapped in DynamicPartitionEvent) to the SourceCoordinator of Fact source its OperatorCoordinator once it collects all input data in finish method. The SourceCoordinator coordinator will deliver the event to the SourceCoordinators of the relating Fact sources, then the SourceCoordinators will deliver the DynamicPartitionEvent to DynamicFileSplitEnumerator.
  5. The DynamicFileSplitEnumerator finds the the relevant partitions from the all partitions via the partition data from dim-source, and creates the target splits for fact-source.
  6. The JM schedules the job vertices for fact-source.
  7. The fact-source gets the splits from DynamicFileSplitEnumerator, reads the data, send the shuffle of Join operator.
  8. The join operator reads the input data and does join operation.

...

We do not need introduce new ExecNode for DynamicParttionTableScan, because the current mechanism has met our requirements, we just need to add input edges for Batch(Stream)ExecTableSourceScan node.

Build StreamGraph

...

Currently, the planner will recursively traverse from sink to source, and create transformations from source to sink. The planner only know the sink transformations. While DynamicPartitionCollector is not a real sink for planner, and we can find the transformation of DynamicPartitionCollector from sink transformations, because source transformation has no input. So we should register the transformations of DynamicPartitionCollector to the planner separately, and build the StreamGraph with all transformations.

...