...
- A new explain mode PLAN_ADVICE to the current ExplainDetails. EXPLAIN PLAN_ADVICE <query> will provide users with instant and actionable SQL advice per query, which is absent in the current explain mechanism.
- A new enum ExplainFormat to better categorize the output format of the explained result. The current EXPLAIN result presents users with a tree-style plain text format, which is only human-readable. By categorizing classifying the format, it is flexible to extend to a more structured format (like JSON) in the future to perform visualization or other analysis.
...
Introduce an interface PlanAnalyzer to analyze the optimized rel nodes generated by Planner. It encloses an inner interface, i.e., AnalyzedResult, to wrap a PlanAdvice with a list of affected rel node IDs.
...
Then Planner#explain will be invoked to generate the optimized physical plan. The The PlanAnalyzer will loop through the optimized physical plan (i.e., relNodes) to perform analysis.
...
KIND | NOTE |
---|---|
WARNING | It reveals potential data correctness risks, such as state expiration, and NDU issues |
ADVICE | It suggests potential SQL optimizer tuning configuration, such as enabling mini-batch to optimize to two_phase aggregation. |
SCOPE | NOTE |
---|---|
QUERY_LEVEL | It provides advice from a global view, targeting the entire query. |
NODE_LEVEL | It provides advice to a specific rel node. |
...
The current explain result is illustrated in the SQL Explain Result. When ExplainDetails contain PLAN_ADVICE, the output format will be like
Condition 1. The optmized optimized physical plan pattern matches the PlanAnalyzers. The analyzers generate some advice with SCOPE#NODE_LEVEL, then the attributes of relNode 's attributes will contain a new entry named "advice" with id. Then the advice content will be appended after the plan.
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
== Abstract Syntax Tree == LogicalUnion(all=[true]) :- LogicalProject(count=[$0], word=[$1]) : +- LogicalFilter(condition=[LIKE($1, _UTF-16LE'F%')]) : +- LogicalTableScan(table=[[default_catalog, default_database, MyTable1]]) +- LogicalProject(count=[$0], word=[$1]) +- LogicalTableScan(table=[[default_catalog, default_database, MyTable2]]) == Optimized Physical Plan With Advice == Union(advice=[1], all=[true], union=[count, word]) :- Calc(select=[count, word], where=[LIKE(word, _UTF-16LE'F%')]) : +- TableSourceScan(table=[[default_catalog, default_database, MyTable1]], fields=[count, word]) +- TableSourceScan(advice=[2], table=[[default_catalog, default_database, MyTable2]], fields=[count, word]) advice[1]: You might want to pay attention to this node because ... advice[2]: You might want to pay attention to this node because ... == Optimized Execution Plan == Union(all=[true], union=[count, word]) :- Calc(select=[count, word], where=[LIKE(word, _UTF-16LE'F%')]) : +- TableSourceScan(table=[[default_catalog, default_database, MyTable1]], fields=[count, word]) +- TableSourceScan(table=[[default_catalog, default_database, MyTable2]], fields=[count, word]) |
Conditition Condition 2. The optmized optimized physical plan pattern matches the PlanAnalyzers. The analyzers generate some advice with SCOPE#QUERY_LEVEL, then and the advice content will be appended after the plan.
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
== Abstract Syntax Tree == LogicalUnion(all=[true]) :- LogicalProject(count=[$0], word=[$1]) : +- LogicalFilter(condition=[LIKE($1, _UTF-16LE'F%')]) : +- LogicalTableScan(table=[[default_catalog, default_database, MyTable1]]) +- LogicalProject(count=[$0], word=[$1]) +- LogicalTableScan(table=[[default_catalog, default_database, MyTable2]]) == Optimized Physical Plan With Advice == Union(all=[true], union=[count, word]) :- Calc(select=[count, word], where=[LIKE(word, _UTF-16LE'F%')]) : +- TableSourceScan(table=[[default_catalog, default_database, MyTable1]], fields=[count, word]) +- TableSourceScan(table=[[default_catalog, default_database, MyTable2]], fields=[count, word]) advice[1]: You might want to configure 'xxx' to improve .... advice[2]: You might want to configure 'yyy' to improve .... == Optimized Execution Plan == Union(all=[true], union=[count, word]) :- Calc(select=[count, word], where=[LIKE(word, _UTF-16LE'F%')]) : +- TableSourceScan(table=[[default_catalog, default_database, MyTable1]], fields=[count, word]) +- TableSourceScan(table=[[default_catalog, default_database, MyTable2]], fields=[count, word]) |
Condition 3. The optmized optimized physical plan pattern matches the PlanAnalyzers. The analyzers generate some advice with both SCOPE#QUERY_LEVEL and SCOPE#NODE_LEVEL, then NODE_LEVEL advice will be tagged to the target relNode's attributes, followed by QUERY_LEVEL advice.
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
== Abstract Syntax Tree == LogicalUnion(all=[true]) :- LogicalProject(count=[$0], word=[$1]) : +- LogicalFilter(condition=[LIKE($1, _UTF-16LE'F%')]) : +- LogicalTableScan(table=[[default_catalog, default_database, MyTable1]]) +- LogicalProject(count=[$0], word=[$1]) +- LogicalTableScan(table=[[default_catalog, default_database, MyTable2]]) == Optimized Physical Plan With Advice == Union(advice=[1], all=[true], union=[count, word]) :- Calc(select=[count, word], where=[LIKE(word, _UTF-16LE'F%')]) : +- TableSourceScan(advice=[2], table=[[default_catalog, default_database, MyTable1]], fields=[count, word]) +- TableSourceScan(table=[[default_catalog, default_database, MyTable2]], fields=[count, word]) advice[1]: You might want to pay attention to this node because ... advice[2]: You might want to pay attention to this node because ... adivce[3]: You might want to configure 'yyy' to improve .... == Optimized Execution Plan == Union(all=[true], union=[count, word]) :- Calc(select=[count, word], where=[LIKE(word, _UTF-16LE'F%')]) : +- TableSourceScan(table=[[default_catalog, default_database, MyTable1]], fields=[count, word]) +- TableSourceScan(table=[[default_catalog, default_database, MyTable2]], fields=[count, word]) |
Conditition Condition 4. The optmized optimized physical plan pattern does not match the PlanAnalyzers. No advice is provided.
...
PlanAnalyzer is an internal interface to perform the analysis and provide advice. Introducing a new PlanAnalyzer is much similar to adding a new RelOptRule to RuleSet.
...
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
/** An implementation of {@link PlanAnalyzer} to collect state TTL sensitive rels. */ @Experimental public class StateExpireRiskAnalyzer implements PlanAnalyzer { public static final StateExpireRiskAnalyzer INSTANCE = new StateExpireRiskAnalyzer(); private static final PlanAdvice STATE_EXPIRE = new PlanAdvice( PlanAdvice.Kind.WARNING, PlanAdvice.Scope.NODE_LEVEL, String.format( "You might want to pay attention to this node because state expiration configuration '%s' is enabled.", ExecutionConfigOptions.IDLE_STATE_RETENTION.key())); private static final Set<String> STATE_TTL_INSENSITIVE_REL = new HashSet<>(); static { // excludes the state TTL insensitive rel } private StateExpireRiskAnalyzer() {} @Override public Optional<AnalyzedResult> analyze(FlinkRelNode rel) { List<Integer> targetRelIds = new ArrayList<>(); boolean enableStateTTL = ShortcutUtils.unwrapTableConfig(rel) .get(ExecutionConfigOptions.IDLE_STATE_RETENTION) .toMillis() > 0; if (rel instanceof FlinkPhysicalRel && enableStateTTL) { rel.accept( new RelShuttleImpl() { @Override public RelNode visit(RelNode other) { if (!STATE_TTL_INSENSITIVE_REL.contains( other.getClass().getCanonicalName())) { targetRelIds.add(other.getId()); } return super.visit(other); } }); if (!targetRelIds.isEmpty()) { return Optional.of( new AnalyzedResult() { @Override public PlanAdvice getAdvice() { return STATE_EXPIRE; } @Override public List<Integer> getTargetIds() { return targetRelIds; } }); } } return Optional.empty(); } } |
ExplainFormat
The problem of with the current output format is that it is a mixture of plain text (AST, Optimized Physical Plan, and Optimized Execution Plan) and JSON (Physical Execution Plan, via EXPLAIN JSON_EXECUTION_PLAN ), which is not structured and categorized.
By introducing ExplainFormat, we can better categorize the explain format. ExplainFormat#TEXT, which corresponds to the current output format, will be the default format to achieve back-wards backward compatibility.
In the future, we might introduce more formats along with the syntax EXPLAIN [ExplainDetails...] WITH [TEXT | JSON ] FORMAT <query>. While this is beyond the scope of this FLIP.
...
On first thought, we want to align the syntax with EXPLAIN JSON_EXECUTION_PLAN. However, it is unclear to know which plan needs to be analyzed. This brings out the second question, which plan should be analyzedinvestigated?
Analyze JSON Execution Plan v.s. Optimized Physical Plan
...
A: Currently, we have three ExplainDetail: CHANGLOG_MODE, ESTIMATED_COST, and JSON_EXECUTION_PLAN.
The relationship between them and the corresponding SQL-processing phase is illustrated below. Stream Graph is first excluded because we want to limit this analysis within the compile time. And from this Fig, we can tell that optimized rel is the closest to the original query.
...