Status
...
Page properties | |
---|---|
|
...
...
...
|
...
...
Released: <Flink Version>
...
|
Motivation
In SQL jobs, join is a very common computing operators. Currently, Flink can automatically select an appropriate join strategy for the SQL based on statistical information of the source table, and generate an execution plan to run the job. However, due to various reasons, such as missing or incorrect statistical information, the execution plan generated by the optimizer is not optimal, and the optimizer will use the wrong join strategy, resulting in a poor final optimization effort or even failing to execute.
...
Code Block |
---|
// an exception that the v2 is not existent will be thrown select /*+ BROADCAST(v2t1) */ t1.a from (select * from test1) t1 join test2 on t1.a = test2.a // use the equivalent sql following create view t1 as select * from test1; select /*+ BROADCAST(v2t1) */ t1.a from t1 join test2 on t1.a = test2.a |
...
For join hint conflicts, we have the many different behaviors to choose from. The following are the advantages and shortcomings of several behaviors. For specific examples corresponding to each behavior, please refer to Appendix [1].
advantage | shortcoming | |
proposal 1: writing order first (we support) |
|
|
proposal 2: high priority first (different join hints have different priority) |
|
|
proposal 3: least cost first |
|
|
proposal 4: only try the first hint |
|
|
proposal 5: ignore conflict hints |
|
|
References
[1] https://docs.google.com/document/d/19PL77ZggPIhz7FMZv2Yx7w_cnYyvmTc-mxUpJYv_934/edit?usp=sharing
...