...
Code Block |
---|
EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query
|
...
- A mapping from table alias to Map Operator Tree – This mapping tells the mappers which operator tree to call in order to process the rows from a particular table or result of a previous map/reduce stage. In Stage-1 in the above example, the rows from src table are processed by the operator tree rooted at a Reduce Output Operator. Similarly, in Stage-2 the rows of the results of Stage-1 are processed by another operator tree rooted at another Reduce Output Operator. Each of these Reduce Output Operators partitions the data to the reducers according to the criteria shown in the metadata.
- A Reduce Operator Tree – This is the operator tree which processes all the rows on the reducer of the map/reduce job. In Stage-1 for example, the Reducer Operator Tree is carrying out a partial aggregation whereas the Reducer Operator Tree in Stage-2 computes the final aggregation from the partial aggregates computed in Stage-1.
The CBO Clause
The CBO clause outputs the plan generated by Calcite optimizer. It can optionally include information about the cost of the plan using Calcite default cost model and cost model used for join reordering. Since Hive release 4.0.0 (HIVE-17503 / HIVE-21184).
Syntax: EXPLAIN [FORMATTED] CBO [COST|JOINCOST]
- COST option prints the plan and cost calculated using Calcite default cost model.
- JOINCOST option prints the plan and cost calculated using the cost model used for join reordering.
The optional clause defaults are not ONLY and SUMMARY.
For example
Code Block |
---|
EXPLAIN CBO
WITH customer_total_return AS
(SELECT sr_customer_sk AS ctr_customer_sk,
sr_store_sk AS ctr_store_sk,
SUM(SR_FEE) AS ctr_total_return
FROM store_returns, date_dim
WHERE sr_returned_date_sk = d_date_sk
AND d_year =2000
GROUP BY sr_customer_sk, sr_store_sk)
SELECT c_customer_id
FROM customer_total_return ctr1, store, customer
WHERE ctr1.ctr_total_return > (SELECT AVG(ctr_total_return)*1.2
FROM customer_total_return ctr2
WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
AND s_store_sk = ctr1.ctr_store_sk
AND s_state = 'NM'
AND ctr1.ctr_customer_sk = c_customer_sk
ORDER BY c_customer_id
LIMIT 100
|
Will produce output like this.
Code Block |
---|
CBO PLAN:
HiveSortLimit(sort0=[$0], dir0=[ASC], fetch=[100])
HiveProject(c_customer_id=[$1])
HiveJoin(condition=[AND(=($3, $7), >($4, $6))], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], cost=[not available])
HiveProject(c_customer_sk=[$0], c_customer_id=[$1])
HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[default, customer]], table:alias=[customer])
HiveJoin(condition=[=($3, $1)], joinType=[inner], algorithm=[none], cost=[not available])
HiveProject(sr_customer_sk=[$0], sr_store_sk=[$1], $f2=[$2])
HiveAggregate(group=[{1, 2}], agg#0=[sum($3)])
HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[not available])
HiveProject(sr_returned_date_sk=[$0], sr_customer_sk=[$3], sr_store_sk=[$7], sr_fee=[$14])
HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($7), IS NOT NULL($3))])
HiveTableScan(table=[[default, store_returns]], table:alias=[store_returns])
HiveProject(d_date_sk=[$0])
HiveFilter(condition=[AND(=($6, 2000), IS NOT NULL($0))])
HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
HiveProject(s_store_sk=[$0])
HiveFilter(condition=[AND(=($24, _UTF-16LE'NM'), IS NOT NULL($0))])
HiveTableScan(table=[[default, store]], table:alias=[store])
HiveProject(_o__c0=[*(/($1, $2), 1.2)], ctr_store_sk=[$0])
HiveAggregate(group=[{1}], agg#0=[sum($2)], agg#1=[count($2)])
HiveProject(sr_customer_sk=[$0], sr_store_sk=[$1], $f2=[$2])
HiveAggregate(group=[{1, 2}], agg#0=[sum($3)])
HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[not available])
HiveProject(sr_returned_date_sk=[$0], sr_customer_sk=[$3], sr_store_sk=[$7], sr_fee=[$14])
HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($7))])
HiveTableScan(table=[[default, store_returns]], table:alias=[store_returns])
HiveProject(d_date_sk=[$0])
HiveFilter(condition=[AND(=($6, 2000), IS NOT NULL($0))])
HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim]) |
Code Block |
---|
EXPLAIN FORMATTED LOCKS <sql>
|
is also supported which will produce JSON encoded output.
The AST Clause
Outputs the query's Abstract Syntax Tree.
...