...
A Hive query gets converted into a sequence (it is more an a Directed Acyclic Graph) of stages. These stages may be map/reduce stages or they may even be stages that do metastore or file system operations like move and rename. The explain output comprises of has three parts:
- The Abstract Syntax Tree for the query
- The dependencies between the different stages of the plan
- The description of each of the stages
The description of the stages itself shows a sequence of operators with the metadata associated with the operators. The metadata may comprise of things like filter expressions for the FilterOperator or the select expressions for the SelectOperator or the output file names for the FileSinkOperator.
Example
As an example, consider the following EXPLAIN
query:
...
The plans of each Stage
Code Block STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: src Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: rand() type: double tag: -1 value expressions: expr: substr(value, 4) type: string Reduce Operator Tree: Group By Operator aggregations: expr: sum(UDFToDouble(VALUE.0)) keys: expr: KEY.0 type: string mode: partial1 File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.mapred.SequenceFileOutputFormat name: binary_table Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: /tmp/hive-zshao/67494501/106593589.10001 Reduce Output Operator key expressions: expr: 0 type: string sort order: + Map-reduce partition columns: expr: 0 type: string tag: -1 value expressions: expr: 1 type: double Reduce Operator Tree: Group By Operator aggregations: expr: sum(VALUE.0) keys: expr: KEY.0 type: string mode: final Select Operator expressions: expr: 0 type: string expr: 1 type: double Select Operator expressions: expr: UDFToInteger(0) type: int expr: 1 type: double File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe name: dest_g1 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe name: dest_g1
In this example there are 2 map/reduce stages (Stage-1 and Stage-2) and 1 File System related stage (Stage-0). Stage-0 basically moves the results from a temporary directory to the directory corresponding to the table dest_g1.
Sort order
indicates the number of columns in key expressions that are used for sorting. Each "+
" represents one column sorted in ascending order, and each "-
" represents a column sorted in descending order.
...
- A mapping from table alias to Map Operator Tree -- This Tree – This mapping tells the mappers which operator tree to call in order to process the rows from a particular table or result of a previous map/reduce stage. In Stage-1 in the above example, the rows from src table are processed by the operator tree rooted at a Reduce Output Operator. Similarly, in Stage-2 the rows of the results of Stage-1 are processed by another operator tree rooted at another Reduce Output Operator. Each of these Reduce Output Operators partitions the data to the reducers according to the criteria shown in the metadata.
- A Reduce Operator Tree -- Tree – This is the operator tree which processes all the rows on the reducer of the map/reduce job. In Stage-1 for example, the Reducer Operator Tree is carrying out a partial aggregation where as whereas the Reducer Operator Tree in Stage-2 computes the final aggregation from the partial aggregates computed in Stage-1.
The DEPENDENCY Clause
The use of DEPENDENCY
in the EXPLAIN
statement produces extra information about the inputs in the plan. It shows various attributes for the inputs. For example, for a query like:
...
As can be seen, src is being accessed via parents v1 and v4.
The AUTHORIZATION Clause
The use of AUTHORIZATION
in the EXPLAIN
statement shows all entities needed to be authorized to execute the query and authorization failures if existsany exist. For example, for a query like:
...
Code Block |
---|
"OUTPUTS":["hdfs://localhost:9000/tmp/.../-mr-10000"],"INPUTS":["default@srcpart","default@src","default@srcpart@ds=2008-04-08/hr=11","default@srcpart@ds=2008-04-08/hr=12","default@srcpart@ds=2008-04-09/hr=11","default@srcpart@ds=2008-04-09/hr=12"],"OPERATION":"QUERY","CURRENT_USER":"navis","AUTHORIZATION_FAILURES":["Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]"]} |
User-level Explain Output
Since HIVE-9780 in Hive 1.2.0, we support a user-level explain for Hive on Tez users (It which can be easily extended to Hive on MR or Spark users). After set hive.explain.user=true (default is false) , if the following query is sent, the user can see a much more clearly readable tree of operations.
Code Block |
---|
EXPLAIN select sum(hash(key)), sum(hash(value)) from src_orc_merge_test_part where ds='2012-01-03' and ts='2012-01-03+14:46:31' |
Code Block |
---|
Plan optimized by CBO. Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 2 File Output Operator [FS_8] compressed:false Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"} Group By Operator [GBY_6] | aggregations:["sum(VALUE._col0)","sum(VALUE._col1)"] | outputColumnNames:["_col0","_col1"] | Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE |<-Map 1 [SIMPLE_EDGE] Reduce Output Operator [RS_5] sort order: Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE value expressions:_col0 (type: bigint), _col1 (type: bigint) Group By Operator [GBY_4] aggregations:["sum(_col0)","sum(_col1)"] outputColumnNames:["_col0","_col1"] Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE Select Operator [SEL_2] outputColumnNames:["_col0","_col1"] Statistics:Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE TableScan [TS_0] alias:src_orc_merge_test_part Statistics:Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE |