...
Page properties |
---|
...
|
...
|
...
...
|
...
...
Released: <Flink Version>
...
|
Motivation
In SQL jobs, join is a very common computing operators. Currently, Flink can automatically select an appropriate join strategy for the SQL based on statistical information of the source table, and generate an execution plan to run the job. However, due to various reasons, such as missing or incorrect statistical information, the execution plan generated by the optimizer is not optimal, and the optimizer will use the wrong join strategy, resulting in a poor final optimization effort or even failing to execute.
...
The behavior of semantic validation is before optimizing the SQL, and the hebavior behavior when meets errors is to throw an exception.
...
Join Hint only affects the current query block, and does not affect the Join strategy in subquery and view. If the join hint really needs to be propagated into a view or subquery, we can discuss it in future,.
For example,
Code Block |
---|
// view create view view1 as select t1.* from t1 join on t2 on t1.a = t2.a // BROADCAST will not be propagated into view1, // and view1 will use the planner's default join strategyt1 join on t2 on t1.a = t2.a // BROADCAST will not be propagated into view1, // and view1 will use the planner's default join strategy select /*+ BROADCAST(t1) */ * from view1 join t3 on view1.a = t2.a // the join in view1 will use the planner's default join strategy, // and the join between view1 and t1 will use BROADCAST select /*+ BROADCAST(t1) */ * from view1 join t1 on view1.a = t1.a |
Support with table alias
Currently, join hint on alias is not supported, and user can create a view for this subquery with a specific view name and use this view name as an argument of the join hint.
For example,
Code Block |
---|
// an exception that the v2 is not existent will be thrown select /*+ BROADCAST(t1) */) */ t1.a from (select * from view1test1) t1 join t3 test2 on view1t1.a = t2test2.a // the join in view1 will use the planner'sequivalent default join strategy, // and the join between view1 and t1 will use BROADCASTsql following create view t1 as select * from test1; select /*+ BROADCAST(t1) */ * t1.a from view1t1 join t1test2 on view1t1.a = t1test2.a |
Conflict in one same hints
...
For join hint conflicts, we have the many different behaviors to choose from. The following are the advantages and shortcomings of several behaviors. For specific examples corresponding to each behavior, please refer to Appendix [1].
advantage | shortcoming | |
proposal 1: writing order first (we support) |
|
|
proposal 2: high priority first (different join hints have different priority) |
|
|
proposal 3: least cost first |
|
|
proposal 4: only try the first hint |
|
|
proposal 5: ignore conflict hints |
|
|
References
[1] https://docs.google.com/document/d/19PL77ZggPIhz7FMZv2Yx7w_cnYyvmTc-mxUpJYv_934/edit?usp=sharing
...