Status
Current state: "Under Discussion"
Discussion thread:
JIRA: here (<- link to
) Jira server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key FLINK-27982
...
In the field of data analysis, star-schema[1] is the simplest style of and most widely used data mart patterns. It's natural in a star schema to map one or more dimensions to partition columns. Detecting and avoiding unnecessary data scans is the most important responsibility of the optimizer. Currently, Flink supports static partition pruning: the conditions in the WHERE clause are analyzed to determine in advance which partitions can be safely skipped in the optimization phase. Another common scenario: the partitions information is not available in the optimization phase but in the execution phase. That's the problem this FLIP is trying to solve: dynamic partition pruning, which could reduce the partition table source IO.
...
In this FLIP we will introduce a mechanism for detecting dynamic partition pruning patterns in optimization phase and performing partition pruning at runtime by sending the dimension table results to the SplitEnumerator of fact table via existed coordinator existing coordinator mechanism.
In above example, the result from date_dim
which d_year
is 2000
will be send to join build side and the Coordinator, and the SplitEnumerator of store_returns
will filter the partitions based on the filtered date_dim
data, and send the real required splits to the source scan of store_returns
.
...
Since the split enumerator is executed in JM, the filtered partition data from dim-source operator should be send to the split enumerator via existed coordinator existing coordinator mechanism. So we introduce DynamicPartitionEvent
to wrap the partition data.
...
Currently, for FileSystem connector and Hive connector, the splits are created by FileEnumerator
. However, existed existing FileEnumerators do not match the requirement, therefore we introduce a new FileEnumerator
named DynamicFileEnumerator
to support creating splits based on partition.
...