Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query

...

  • A mapping from table alias to Map Operator Tree  This mapping tells the mappers which operator tree to call in order to process the rows from a particular table or result of a previous map/reduce stage. In Stage-1 in the above example, the rows from src table are processed by the operator tree rooted at a Reduce Output Operator. Similarly, in Stage-2 the rows of the results of Stage-1 are processed by another operator tree rooted at another Reduce Output Operator. Each of these Reduce Output Operators partitions the data to the reducers according to the criteria shown in the metadata.
  • A Reduce Operator Tree – This is the operator tree which processes all the rows on the reducer of the map/reduce job. In Stage-1 for example, the Reducer Operator Tree is carrying out a partial aggregation whereas the Reducer Operator Tree in Stage-2 computes the final aggregation from the partial aggregates computed in Stage-1.

The CBO Clause

The CBO clause outputs the plan generated by Calcite optimizer. It can optionally include information about the cost of the plan using Calcite default cost model and cost model used for join reordering. Since Hive release 4.0.0 (HIVE-17503HIVE-21184).

Syntax: EXPLAIN [FORMATTED] CBO [COST|JOINCOST]

  • COST option prints the plan and cost calculated using Calcite default cost model.
  • JOINCOST option prints the plan and cost calculated using the cost model used for join reordering.

The optional clause defaults are not ONLY and SUMMARY.

For example

Code Block
EXPLAIN CBO
WITH customer_total_return AS
(SELECT sr_customer_sk AS ctr_customer_sk,
  sr_store_sk AS ctr_store_sk,
  SUM(SR_FEE) AS ctr_total_return
  FROM store_returns, date_dim
  WHERE sr_returned_date_sk = d_date_sk
    AND d_year =2000
  GROUP BY sr_customer_sk, sr_store_sk)
SELECT c_customer_id
FROM customer_total_return ctr1, store, customer
WHERE ctr1.ctr_total_return > (SELECT AVG(ctr_total_return)*1.2
FROM customer_total_return ctr2
WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
  AND s_store_sk = ctr1.ctr_store_sk
  AND s_state = 'NM'
  AND ctr1.ctr_customer_sk = c_customer_sk
ORDER BY c_customer_id
LIMIT 100

Will produce output like this.

Code Block
CBO PLAN:
HiveSortLimit(sort0=[$0], dir0=[ASC], fetch=[100])
  HiveProject(c_customer_id=[$1])
    HiveJoin(condition=[AND(=($3, $7), >($4, $6))], joinType=[inner], algorithm=[none], cost=[not available])
      HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], cost=[not available])
        HiveProject(c_customer_sk=[$0], c_customer_id=[$1])
          HiveFilter(condition=[IS NOT NULL($0)])
            HiveTableScan(table=[[default, customer]], table:alias=[customer])
        HiveJoin(condition=[=($3, $1)], joinType=[inner], algorithm=[none], cost=[not available])
          HiveProject(sr_customer_sk=[$0], sr_store_sk=[$1], $f2=[$2])
            HiveAggregate(group=[{1, 2}], agg#0=[sum($3)])
              HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[not available])
                HiveProject(sr_returned_date_sk=[$0], sr_customer_sk=[$3], sr_store_sk=[$7], sr_fee=[$14])
                  HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($7), IS NOT NULL($3))])
                    HiveTableScan(table=[[default, store_returns]], table:alias=[store_returns])
                HiveProject(d_date_sk=[$0])
                  HiveFilter(condition=[AND(=($6, 2000), IS NOT NULL($0))])
                    HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
          HiveProject(s_store_sk=[$0])
            HiveFilter(condition=[AND(=($24, _UTF-16LE'NM'), IS NOT NULL($0))])
              HiveTableScan(table=[[default, store]], table:alias=[store])
      HiveProject(_o__c0=[*(/($1, $2), 1.2)], ctr_store_sk=[$0])
        HiveAggregate(group=[{1}], agg#0=[sum($2)], agg#1=[count($2)])
          HiveProject(sr_customer_sk=[$0], sr_store_sk=[$1], $f2=[$2])
            HiveAggregate(group=[{1, 2}], agg#0=[sum($3)])
              HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[not available])
                HiveProject(sr_returned_date_sk=[$0], sr_customer_sk=[$3], sr_store_sk=[$7], sr_fee=[$14])
                  HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($7))])
                    HiveTableScan(table=[[default, store_returns]], table:alias=[store_returns])
                HiveProject(d_date_sk=[$0])
                  HiveFilter(condition=[AND(=($6, 2000), IS NOT NULL($0))])
                    HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])

Code Block
EXPLAIN FORMATTED LOCKS <sql>

is also supported which will produce JSON encoded output.

The AST Clause

Outputs the query's Abstract Syntax Tree.

...