Page History

...

A Hive query gets converted into a sequence (it is more an a Directed Acyclic Graph) of stages. These stages may be map/reduce stages or they may even be stages that do metastore or file system operations like move and rename. The explain output comprises of has three parts:

The Abstract Syntax Tree for the query
The dependencies between the different stages of the plan
The description of each of the stages

The description of the stages itself shows a sequence of operators with the metadata associated with the operators. The metadata may comprise of things like filter expressions for the FilterOperator or the select expressions for the SelectOperator or the output file names for the FileSinkOperator.

Example

As an example, consider the following EXPLAIN query:

...

The plans of each Stage

Code Block

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        src
            Reduce Output Operator
              key expressions:
                    expr: key
                    type: string
              sort order: +
              Map-reduce partition columns:
                    expr: rand()
                    type: double
              tag: -1
              value expressions:
                    expr: substr(value, 4)
                    type: string
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: sum(UDFToDouble(VALUE.0))
          keys:
                expr: KEY.0
                type: string
          mode: partial1
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
                name: binary_table

  Stage: Stage-2
    Map Reduce
      Alias -> Map Operator Tree:
        /tmp/hive-zshao/67494501/106593589.10001
          Reduce Output Operator
            key expressions:
                  expr: 0
                  type: string
            sort order: +
            Map-reduce partition columns:
                  expr: 0
                  type: string
            tag: -1
            value expressions:
                  expr: 1
                  type: double
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: sum(VALUE.0)
          keys:
                expr: KEY.0
                type: string
          mode: final
          Select Operator
            expressions:
                  expr: 0
                  type: string
                  expr: 1
                  type: double
            Select Operator
              expressions:
                    expr: UDFToInteger(0)
                    type: int
                    expr: 1
                    type: double
              File Output Operator
                compressed: false
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
                    name: dest_g1

  Stage: Stage-0
    Move Operator
      tables:
            replace: true
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
                name: dest_g1

In this example there are 2 map/reduce stages (Stage-1 and Stage-2) and 1 File System related stage (Stage-0). Stage-0 basically moves the results from a temporary directory to the directory corresponding to the table dest_g1.
Sort order indicates the number of columns in key expressions that are used for sorting. Each "+" represents one column sorted in ascending order, and each "-" represents a column sorted in descending order.

...

A mapping from table alias to Map Operator Tree -- This Tree – This mapping tells the mappers which operator tree to call in order to process the rows from a particular table or result of a previous map/reduce stage. In Stage-1 in the above example, the rows from src table are processed by the operator tree rooted at a Reduce Output Operator. Similarly, in Stage-2 the rows of the results of Stage-1 are processed by another operator tree rooted at another Reduce Output Operator. Each of these Reduce Output Operators partitions the data to the reducers according to the criteria shown in the metadata.
A Reduce Operator Tree -- Tree – This is the operator tree which processes all the rows on the reducer of the map/reduce job. In Stage-1 for example, the Reducer Operator Tree is carrying out a partial aggregation where as whereas the Reducer Operator Tree in Stage-2 computes the final aggregation from the partial aggregates computed in Stage-1.

The DEPENDENCY Clause

The use of DEPENDENCY in the EXPLAIN statement produces extra information about the inputs in the plan. It shows various attributes for the inputs. For example, for a query like:

...

As can be seen, src is being accessed via parents v1 and v4.

The AUTHORIZATION Clause

The use of AUTHORIZATION in the EXPLAIN statement shows all entities needed to be authorized to execute the query and authorization failures if existsany exist. For example, for a query like:

...

Code Block

"OUTPUTS":["hdfs://localhost:9000/tmp/.../-mr-10000"],"INPUTS":["default@srcpart","default@src","default@srcpart@ds=2008-04-08/hr=11","default@srcpart@ds=2008-04-08/hr=12","default@srcpart@ds=2008-04-09/hr=11","default@srcpart@ds=2008-04-09/hr=12"],"OPERATION":"QUERY","CURRENT_USER":"navis","AUTHORIZATION_FAILURES":["Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]"]}

User-level Explain Output

Since HIVE-9780 in Hive 1.2.0, we support a user-level explain for Hive on Tez users (It which can be easily extended to Hive on MR or Spark users). After set hive.explain.user=true (default is false) , if the following query is sent, the user can see a much more clearly readable tree of operations.

Code Block
EXPLAIN select sum(hash(key)), sum(hash(value)) from src_orc_merge_test_part where ds='2012-01-03' and ts='2012-01-03+14:46:31'

Code Block

Plan optimized by CBO.
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Stage-0
   Fetch Operator
      limit:-1
      Stage-1
         Reducer 2
         File Output Operator [FS_8]
            compressed:false
            Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
            table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"}
            Group By Operator [GBY_6]
            |  aggregations:["sum(VALUE._col0)","sum(VALUE._col1)"]
            |  outputColumnNames:["_col0","_col1"]
            |  Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
            |<-Map 1 [SIMPLE_EDGE]
               Reduce Output Operator [RS_5]
                  sort order:
                  Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
                  value expressions:_col0 (type: bigint), _col1 (type: bigint)
                  Group By Operator [GBY_4]
                     aggregations:["sum(_col0)","sum(_col1)"]
                     outputColumnNames:["_col0","_col1"]
                     Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
                     Select Operator [SEL_2]
                        outputColumnNames:["_col0","_col1"]
                        Statistics:Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE
                        TableScan [TS_0]
                           alias:src_orc_merge_test_part
                           Statistics:Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE

Space shortcuts

Child pages

Versions Compared

Old Version 13

New Version 14

Key

Example

The DEPENDENCY Clause

The AUTHORIZATION Clause

User-level Explain Output