...
A new query hint: 'LOOKUP' with different hint options ('async'='true|false', 'miss-retry'='true|false') to cover all related functionalities(include FLINK-27625 and discussion on connector option 'lookup.async' in FLIP-221[4]). Compared to multiple hints with different subsets of functionality, a single hint may be easier for users to understand and use, and specific parameters can be quickly found through documentation
The available hint options for each mode:
mode | support hint options |
async | 'async'='true' 'output-mode'='ordere|allow-unordered' 'capacity'='100' 'timeout'='180s' |
retry | 'miss-retry'='true' 'retry-strategy'='fixed-delay' 'delay'='10s' 'max-attempts'='3' |
...
For these connectors which can have both capabilities of async and sync lookup, our advice for the connector developers are implementing both sync and async interfaces if both capabilities have suitable use cases, the planner will prefer the async one by default, and users can give different option value 'async'='true|false' via the LOOKUP query hint to hint to suggest the planner,
otherwise choose one interface to implement.
Because query hint works in a best effort manner, so if users specifies a hint with invalid option, the query plan keeps unchanged, e.g., use LOOKUP('table'='customer', 'async'='true'), but backend lookup source only implemented the sync lookup function, then the async lookup hint takes no effect.
...
Code Block | ||
---|---|---|
| ||
LOOKUP('table'='dim1', 'async'='true', 'output-mode'='allow-unordered', 'capacity'='100', 'timeout'='180s') |
e.g., if the job level configuration is:
Code Block | ||
---|---|---|
| ||
table.exec.async-lookup.output-mode: ORDERED table.exec.async-lookup.buffer-capacity: 100 table.exec.async-lookup.timeout: 180s |
then the following hints:
Code Block | ||
---|---|---|
| ||
1. LOOKUP('table'='dim1', 'async'='true', 'output-mode'='allow-unordered') 2. LOOKUP('table'='dim1', 'async'='true', 'timeout'='300s') |
are equivalent to:
Code Block | ||
---|---|---|
| ||
1. LOOKUP('table'='dim1', 'async'='true', 'output-mode'='allow-unordered', 'capacity'='100', 'timeout'='180s') 2. LOOKUP('table'='dim1', 'async'='true', 'output-mode'='ordered', 'capacity'='100', 'timeout'='300s') |
...
3. FLIP-204: Introduce Hash Lookup Join
4. https://lists.apache.org/thread/1vokqdnnt01yycl7y1p74g556cc8yvtq