Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Status

...

Page properties


Discussion thread

...

...

...

...

...

Released: <Flink Version>

...

s70cjbbr5565m44f4mfqo9w7xdq09cf1
JIRA

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-27853

Release1.16


Motivation

In SQL jobs, join is a very common computing operators. Currently, Flink can automatically select an appropriate join strategy for the SQL based on statistical information of the source table, and generate an execution plan to run the job. However, due to various reasons, such as missing or incorrect statistical information, the execution plan generated by the optimizer is not optimal, and the optimizer will use the wrong join strategy, resulting in a poor final optimization effort or even failing to execute.

...

Code Block
// an exception that the v2 is not existent will be thrown 
select /*+ BROADCAST(v2t1) */ t1.a
from (select * from test1) t1 join test2
on t1.a = test2.a

// use the equivalent sql following
create view t1 as select * from test1;
select /*+ BROADCAST(v2t1) */ t1.a
from t1 join test2 on t1.a = test2.a

...

For join hint conflicts, we have the many different behaviors to choose from. The following are the advantages and shortcomings of several behaviors. For specific examples corresponding to each behavior, please refer to Appendix [1].


advantage

shortcoming

proposal 1: writing order first

(we support)

  • easy to resolve the hint conflict
  • the first hint by the written order may be not the most optimal strategy

proposal 2: high priority first

(different join hints have different priority)


  • always choose the better join strategy by the priority we define.
  • user can write two or more two join hints and all of them can be considered by actual scene
  • hard to define the priority when a new join strategy will be added
  • user can't control the order of trying different join strategies

proposal 3: least cost first

  • always choose the faster join strategy by CBO
  • user can write two or more join hints and all of them can be considered by CBO
  • It depends on CBO and may lead to an uncontrollable behavior.
  • user can't control the order of trying different join strategies

proposal 4: only try the first hint

  • easy to resolve the hint conflict
  • sometimes user want to try both two join hints but this proposal may only try one join hint

proposal 5: ignore conflict hints

  • easy enough to resolve the hint conflict
  • sometimes user want to try both two join hints but this proposal will ignore all of them

References

[1] https://docs.google.com/document/d/19PL77ZggPIhz7FMZv2Yx7w_cnYyvmTc-mxUpJYv_934/edit?usp=sharing

...