Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Status

...

Page properties


Discussion thread

...

...

...

...

...

Released: <Flink Version>

...

thread/s70cjbbr5565m44f4mfqo9w7xdq09cf1
JIRA

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-27853

Release1.16


Motivation

In SQL jobs, join is a very common computing operators. Currently, Flink can automatically select an appropriate join strategy for the SQL based on statistical information of the source table, and generate an execution plan to run the job. However, due to various reasons, such as missing or incorrect statistical information, the execution plan generated by the optimizer is not optimal, and the optimizer will use the wrong join strategy, resulting in a poor final optimization effort or even failing to execute.

...

For join hint conflicts, we have the many different behaviors to choose from. The following are the advantages and shortcomings of several behaviors. For specific examples corresponding to each behavior, please refer to Appendix [1].


advantage

shortcoming

proposal 1: writing order first

(we support)

  • easy to resolve the hint conflict
  • the first hint by the written order may be not the most optimal strategy

proposal 2: high priority first

(different join hints have different priority)


  • always choose the better join strategy by the priority we define.
  • user can write two or more two join hints and all of them can be considered by actual scene
  • hard to define the priority when a new join strategy will be added
  • user can't control the order of trying different join strategies

proposal 3: least cost first

  • always choose the faster join strategy by CBO
  • user can write two or more join hints and all of them can be considered by CBO
  • It depends on CBO and may lead to an uncontrollable behavior.
  • user can't control the order of trying different join strategies

proposal 4: only try the first hint

  • easy to resolve the hint conflict
  • sometimes user want to try both two join hints but this proposal may only try one join hint

proposal 5: ignore conflict hints

  • easy enough to resolve the hint conflict
  • sometimes user want to try both two join hints but this proposal will ignore all of them

References

[1] https://docs.google.com/document/d/19PL77ZggPIhz7FMZv2Yx7w_cnYyvmTc-mxUpJYv_934/edit?usp=sharing

...