...
Discussion thread: here (<- link to to https://mail-archiveslists.apache.org/mod_mbox/flink-dev/thread/y668bxyjz66ggtjypfz9t571m0tyvv9h)
JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)
Released: <Flink Version>Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
In SQL jobs, join is a very common computing operators. Currently, Flink can automatically select an appropriate join strategy for the SQL based on statistical information of the source table, and generate an execution plan to run the job. However, due to various reasons, such as missing or incorrect statistical information, the execution plan generated by the optimizer is not optimal, and the optimizer will use the wrong join strategy, resulting in a poor final optimization effort or even failing to execute.
...
For join hint conflicts, we have the many different behaviors to choose from. The following are the advantages and shortcomings of several behaviors. For specific examples corresponding to each behavior, please refer to Appendix [1].
advantage | shortcoming | |
proposal 1: writing order first (we support) |
|
|
proposal 2: high priority first (different join hints have different priority) |
|
|
proposal 3: least cost first |
|
|
proposal 4: only try the first hint |
|
|
proposal 5: ignore conflict hints |
|
|
References
[1] https://docs.google.com/document/d/19PL77ZggPIhz7FMZv2Yx7w_cnYyvmTc-mxUpJYv_934/edit?usp=sharing
...