...

To enable hash lookup join, user only needs specify a new hint (useSHUFFLE_hashHASH) in select clause in query, just like use_hash hint in Oraclespark[2] sql.

Code Block

language	sql

SELECT /*+ USESHUFFLE_HASH('Orders', 'Customers') */ o.order_id, o.total, c.country, c.zip
FROM Orders AS o
JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
ON o.customer_id = c.id;

...

Table names in use_hash hint is required to avoid mistake hint propagation if there are multiple lookup joins. In theory, it's better to support both table names and alias names, but the alias name of subquery or table would not be lost by the sql converter and sql optimizer. So here we only support table names.
The hint only provides a suggestion to the optimization, it is not an enforcer.

Proposed Changes

Define Hint Strategy

"USESHUFFLE_HASH", this hint can only be applied to CORRELATE relations which satisfy the following conditions:

the correlate is a look up join, other correlate would ignore the hint
the correlate has "Orders" and "Customers" as the input table names. In theory, it's better to support both table names and alias names, but the alias name of subquery or table would not be lost by the sql converter and sql optimizer. So here we only support table names.

The code below shows how we define hint strategy for hash lookupJoin.

Code Block

language	java
title	USE_HASH hint strategy

      builder
        .hintStrategy("USESHUFFLE_HASH",
            HintStrategy.builder(
                HintPredicates.and(HintPredicates.CORRELATE, isLookupJoin(), joinWithFixedTableNamelookupJoinWithFixedTableName())))
        .build();

Note,

it has a blocker on Calcite version upgrade.

Calcite would translate above sql into Correlate instead of Join . Correlate is not a kind of RelNode that can attach RelHint s currently. The hint could not be propagated to the Correlate nodes, eitheruntil 1.30.0 version.

I've report a

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	CALCITE-4967

in Calcite, which has been merged and would be published in CALCITE 1.30.0 version.

...

Besides, we would add integration tests for connectors to verify it can cooperate with existing source/sink implementations.

...

Rejected Alternatives

Hint Syntax

We we could use name 'SHUFFLEuse_HASHhash' just like spark[2] sqlhint in Oracle.

Code Block

language	sql

SELECT /*+ SHUFFLEUSE_HASH('Orders', 'Customers') */ o.order_id, o.total, c.country, c.zip
FROM Orders AS o
JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
ON o.customer_id = c.id;

...

There is a simpler but a little hack hacky implementation, this is also what we apply in the internal version.

That is, propagating 'useSHUFFLE_hashHASH' hint to TableScan with matched table names. In this way, the hint would not be missed by Flink optimizer until it needs the hint in LookupJoinRules.

...

Compared with the previous solution, this solution has two advantages:

It does not need Calcite version upgrade, see
Jira
server ASF JIRA
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key CALCITE-4967
It does not need code refactor to ensure the hint would not be missed by Flink optimizer.

...

Page tree

Versions Compared

Old Version 12

New Version 13

Key

Proposed Changes

Define Hint Strategy

Rejected Alternatives

Hint Syntax

Page tree

Page History

Versions Compared

Old Version 12

New Version 13

Key

Proposed Changes

Define Hint Strategy

Rejected Alternatives

Hint Syntax