Page History

...

Discussion thread: here (<- link to to https://mail-archiveslists.apache.org/mod_mbox/flink-dev/thread/y668bxyjz66ggtjypfz9t571m0tyvv9h)

JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)

Released: <Flink Version>Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

In SQL jobs, join is a very common computing operators. Currently, Flink can automatically select an appropriate join strategy for the SQL based on statistical information of the source table, and generate an execution plan to run the job. However, due to various reasons, such as missing or incorrect statistical information, the execution plan generated by the optimizer is not optimal, and the optimizer will use the wrong join strategy, resulting in a poor final optimization effort or even failing to execute.

...

For join hint conflicts, we have the many different behaviors to choose from. The following are the advantages and shortcomings of several behaviors. For specific examples corresponding to each behavior, please refer to Appendix [1].

	advantage	shortcoming
proposal 1: writing order first (we support)	easy to resolve the hint conflict	the first hint by the written order may be not the most optimal strategy
proposal 2: high priority first (different join hints have different priority)	always choose the better join strategy by the priority we define. user can write two or more two join hints and all of them can be considered by actual scene	hard to define the priority when a new join strategy will be added user can't control the order of trying different join strategies
proposal 3: least cost first	always choose the faster join strategy by CBO user can write two or more join hints and all of them can be considered by CBO	It depends on CBO and may lead to an uncontrollable behavior. user can't control the order of trying different join strategies
proposal 4: only try the first hint	easy to resolve the hint conflict	sometimes user want to try both two join hints but this proposal may only try one join hint
proposal 5: ignore conflict hints	easy enough to resolve the hint conflict	sometimes user want to try both two join hints but this proposal will ignore all of them

References

[1] https://docs.google.com/document/d/19PL77ZggPIhz7FMZv2Yx7w_cnYyvmTc-mxUpJYv_934/edit?usp=sharing

...

Page tree

Versions Compared

Old Version 8

New Version 9

Key

Motivation

References