Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


...

Page properties

...


Discussion

...

thread

...

...

...

...

Released: <Flink Version>

...

thread/s70cjbbr5565m44f4mfqo9w7xdq09cf1
JIRA

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-27853

Release1.16


Motivation

In SQL jobs, join is a very common computing operators. Currently, Flink can automatically select an appropriate join strategy for the SQL based on statistical information of the source table, and generate an execution plan to run the job. However, due to various reasons, such as missing or incorrect statistical information, the execution plan generated by the optimizer is not optimal, and the optimizer will use the wrong join strategy, resulting in a poor final optimization effort or even failing to execute.

...

The behavior of semantic validation is before optimizing the SQL, and the hebavior behavior when meets errors is to throw an exception.

...

Join Hint only affects the current query block, and does not affect the Join strategy in subquery and view. If the join hint really needs to be propagated into a view or subquery, we can discuss it in future,.

For example,

Code Block
// view
create view view1 as select t1.* from t1 join on t2 on t1.a = t2.a

// BROADCAST will not be propagated into view1,
// and view1 will use the planner's default join strategyt1 join on t2 on t1.a = t2.a

// BROADCAST will not be propagated into view1,
// and view1 will use the planner's default join strategy
select /*+ BROADCAST(t1) */ * from view1 join t3 on view1.a = t2.a

// the join in view1 will use the planner's default join strategy,
// and the join between view1 and t1 will use BROADCAST
select /*+ BROADCAST(t1) */ * from view1 join t1 on view1.a = t1.a

Support with table alias

Currently, join hint on alias is not supported, and user can create a view for this subquery with a specific view name and use this view name as an argument of the join hint.

For example,

Code Block
// an exception that the v2 is not existent will be thrown 
select /*+ BROADCAST(t1) */) */ t1.a
from (select * from view1test1) t1 join t3 test2
on view1t1.a = t2test2.a

// the join in view1 will use the planner'sequivalent default join strategy,
// and the join between view1 and t1 will use BROADCASTsql following
create view t1 as select * from test1;
select /*+ BROADCAST(t1) */ * t1.a
from view1t1 join t1test2 on view1t1.a = t1test2.a

Conflict in one same hints

...

For join hint conflicts, we have the many different behaviors to choose from. The following are the advantages and shortcomings of several behaviors. For specific examples corresponding to each behavior, please refer to Appendix [1].


advantage

shortcoming

proposal 1: writing order first

(we support)

  • easy to resolve the hint conflict
  • the first hint by the written order may be not the most optimal strategy

proposal 2: high priority first

(different join hints have different priority)


  • always choose the better join strategy by the priority we define.
  • user can write two or more two join hints and all of them can be considered by actual scene
  • hard to define the priority when a new join strategy will be added
  • user can't control the order of trying different join strategies

proposal 3: least cost first

  • always choose the faster join strategy by CBO
  • user can write two or more join hints and all of them can be considered by CBO
  • It depends on CBO and may lead to an uncontrollable behavior.
  • user can't control the order of trying different join strategies

proposal 4: only try the first hint

  • easy to resolve the hint conflict
  • sometimes user want to try both two join hints but this proposal may only try one join hint

proposal 5: ignore conflict hints

  • easy enough to resolve the hint conflict
  • sometimes user want to try both two join hints but this proposal will ignore all of them

References

[1] https://docs.google.com/document/d/19PL77ZggPIhz7FMZv2Yx7w_cnYyvmTc-mxUpJYv_934/edit?usp=sharing

...