Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current state[Under Discussion]

Discussion thread: - https://lists.apache.org/list?dev@flink.apache.org:lte=1M:Introduce%20Hash%20Lookup%20Join

Vote thread: -

JIRA:

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-23687

...

Look up join is commonly used feature in Flink SQL. We have received many optimization requirements on look up join. For example:
1. Enforces left side of lookup join do a hash partitioner to raise cache hint ratio

...

Use Hint to require Hash Distribution

LookupJoinRules would check whether FlinkLogicalJoin contain USE_HASH Hint. If yes, the rules require the input must have hash distribution on join keys when converting FlinkLogicalJoin to LookupJoin .

Compatibility, Deprecation, and Migration Plan

...

That is, propagating 'use_hash' hint to TableScan  with matched table names. In this way, the hint would not be missed by Flink optimizer until it needs the hint in  LookupJoinRules.

LookupJoinRules would only check whether the dimension table scan has  'use_hash' hint. If yes, it would require the input must have hash distribution on join keys.

Compared with the previous solution, this solution has two advantages:

  1. It does not need Calcite upgrade, see
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyCALCITE-4967
  2. It does not need code refactor to ensure the hint would not be missed by Flink optimizer.

...