...
Current state: [Under Discussion]
Discussion thread: - https://lists.apache.org/list?dev@flink.apache.org:lte=1M:Introduce%20Hash%20Lookup%20Join
Vote thread: -
JIRA:
Jira | ||||||
---|---|---|---|---|---|---|
|
...
Look up join is commonly used feature in Flink SQL. We have received many optimization requirements on look up join. For example:
1. Enforces left side of lookup join do a hash partitioner to raise cache hint ratio
...
Use Hint to require Hash Distribution
LookupJoinRules would check whether FlinkLogicalJoin
contain USE_HASH Hint. If yes, the rules require the input must have hash distribution on join keys when converting FlinkLogicalJoin
to LookupJoin
.
Compatibility, Deprecation, and Migration Plan
...
That is, propagating 'use_hash' hint to TableScan
with matched table names. In this way, the hint would not be missed by Flink optimizer until it needs the hint in LookupJoinRules.
LookupJoinRules would only check whether the dimension table scan has 'use_hash' hint. If yes, it would require the input must have hash distribution on join keys.
Compared with the previous solution, this solution has two advantages:
- It does not need Calcite upgrade, see
Jira server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key CALCITE-4967 It does not need code refactor to ensure the hint would not be missed by Flink optimizer.
...