Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Table of Contents
maxLevel1

Abstract

Apache Hadoop is a framework for the distributed processing of large data sets using clusters of computers typically composed of commodity hardware. Over last few years Apache Hadoop has become the de facto platform for distributed data processing using commodity hardware. Apache Hive is a popular SQL interface for data processing using Apache Hadoop.

...

Join reordering and join algorithm selection are few of the optimizations that can benefit from a cost based optimizer. Cost based optimizer would free up user from having to rearrange joins in the right order or from having to specify join algorithm by using query hints and configuration options. This can potentially free up users to model their reporting and ETL needs close to business process without having to worry about query optimizations.

 

Optiq Calcite is an open source cost based query optimizer and query execution framework. Optiq Calcite currently has more than fifty query optimization rules that can rewrite query tree, and an efficient plan pruner that can select cheapest query plan in an optimal manner. In this paper we discuss how Optiq Calcite can be used to introduce Cost Based Logical Optimizer (CBO) in to Apache Hive.

...

CBO will be introduced in to Hive in a Phased manner. In the first phase, Optiq Calcite would be used to reorder joins and to pick right join algorithm so as to reduce query latency. Table cardinality and Boundary statistics will be used for this cost based optimizations.

...

  • How to order Join
  • What algorithm to use for a given Join
  • Should the intermediate result be persisted or should it be recomputed on operator failure.
  • The degree of parallelism at any operator (specifically number of reducers to use).
  • Semi Join selection

Optiq Calcite is an open source, Apache Licensed, query planning and execution framework. Many pieces of Optiq Calcite are derived from Eigenbase Project. Optiq Calcite has optional JDBC server, query parser and validator, query optimizer and pluggable data source adapters. One of the available Optiq Calcite optimizer is a cost based optimizer based on volcano paper. Currently different pieces of Optiq Calcite is used in following projects/products:

  • Apache Drill
  • Cascading (Lingual)
  • Lucid DB
  • Mondrian/Pentaho

 Image Added

Optiq Calcite currently has over fifty cost based optimization rules. Some of the prominent cost based optimization rules are listed below:

...

In this document we propose to use Optiq’s Calcite’s cost based optimizer, Volcano, to perform Cost Based Optimizations in Hive. We propose to implement Optiq Calcite based CBO in a phased manner. Note here that proposal is to use Optiq’s Calcite’s optimizer only and nothing else. Listed below are the envisioned stages of introducing CBO in to Hive using OptiqCalcite:

  • Phase1 – Join Reordering & Join algorithm Selection
    • Table cardinality and boundary statistics will be used to compute operator cardinality.
    • Hive operator tree will be converted to Optiq Calcite operator tree.
    • Volcano optimizer in Optiq Calcite will be used to rearrange joins and to pick the join algorithm.
    • Optimized Optiq Calcite operator tree would be converted back to Hive AST and will be executed as before. So all of the Hive’s existing optimizations would run on top of Optiq Calcite optimized SQL.
  • Phase2 – Add support for Histograms, use other optimizations in OptiqCalcite
    • Introduce space efficient histograms
    • Change operator cardinality computation to use histograms
    • Register additional optimization rules available in Optiq Calcite like the ones listed above.
  • Phase3 – Code restructuring so that Optiq Calcite generates optimized Hive Operator tree
    • Unlike phase1 Hive AST would be directly translated into Optiq Calcite operator tree.
    • Optimize Optiq Calcite operator tree using Volcano optimizer.
    • Convert optimized Optiq Calcite operator tree back into Hive Operator tree. This is unlike phase1 where optimized Optiq Calcite operator tree is converted to Hive AST.

...

CBO will be introduced in to Hive in three different phases. Following provides high-level overview of these phases: 

Phase 1

 Image Added

Statistics:

  • Table Cardinality
  • Column Boundary Stats: min, max, avg, number of distinct values

...

  • Join ordering
  • Join Algorithm

 

Restrictions:

  • Optiq Calcite CBO will be used only for select expressions
  • Optiq Calcite CBO won’t be used if select expression contains any of the following operators:
    • Sort By

...

  • UDTF (Table Functions)
  • PTF (Partitioned Table Functions)

 

Optiq Calcite related enhancements:

  • Introduce Operators to represent hive relational operators. Table Scan, Join, Union, Select, Filter, Group By, Distinct, Order By. These operators would implement a calling convention with physical cost for each of these operators.
  • Introduce rules to convert Joins from CommonJoin to MapJoin, MapJoin to BucketJoin, BucketJoin to SMBJoin, CommonJoin to SkewJoin.
  • Introduce rule to merge joins so that a single join operator will represent multi-way join (similar to MergedJoin in Hive).
  • Merged-Join in Hive will be translated to MultiJoinRel in OptiqCalcite.

 

Phase 2

Image Added

Statistics:

  • Histograms

 

Cost Based Optimizations:

  • Join ordering based on histograms
  • Join Algorithm – histograms are used for estimating join selectivity
  • Take advantage of additional optimizations in OptiqCalcite. The actual rules to use is TBD.

Phase 3

Image Added

Configuration

The configuration parameter hive.cbo.enable determines whether cost-based optimization is enabled or not.

Proposed Cost Model

Hive employs parallel-distributed query execution using Hadoop cluster. This implies for a given query operator tree different operators could be running on different nodes. Also same operator may be running in parallel on different nodes in the cluster, processing different partitions of the original relation. This parallel execution model induces high I/O and CPU costs. Hive query execution cost tends to be more I/O centric due to following reasons.

...

Average size of the tuple in relation and cardinality of the relation will be used to estimate resources needed to hold a relation in memory. Memory needed to hold a relation is used to decide whether certain join algorithms, like Map/Bucket Join, can be used.

 

Volcano optimizer in Optiq Calcite compares cost of two equivalent query plans by getting cost of each operator in the tree and summing them up to find cumulative cost. Plan with lowest cumulative cost is picked as the best plan to execute the query. “VolcanoCost” is the Java class representing cost for Optiq’s Calcite’s Volcano optimizer. “VolcanoCost” comparison operators seem to take into consideration only the number of rows to decide if one cost is less than other.

...

5. Phase 1 – Work Items

  1. Walk the Hive Op Tree OP tree and make sure that OP tree doesn’t contain any op that cannot be translated into Optiq Calcite (Lateral Views, PTF, Cubes & Rollups, Multi Table Insert).
  2. Walk the Hive OP tree and introduce cast functions to make sure that all of the comparisons (implicit & explicit) are strictly type safe (the type must be same on both sides).
  3. Implement Optiq Calcite operators specific to Hive that would do cost computation and cloning.
  4. Convert the Hive OP tree to Optiq Op Calcite OP tree.
    1. Convert Hive Types to Optiq Calcite types
    2. Convert Hive Expressions to Optiq Calcite expressions
    3. Convert Hive Operator to Optiq Calcite operator
    4. Handle Hidden Columns
    5. Handle columns introduced by ReduceSink (for shuffling)
    6. Handle Join condition expressions stashed in Reducesink op
    7. Handle filters in Join Conditions
    8. Convert Hive Semi Join to OptiqCalcite
    9. Attach cost to operators
    10. Alias the top-level query projections to query projections that user expects.
  5. Optimize the Optiq Calcite OP tree using Volcano Optimizer.
    1. Implement Rules to convert Joins to Hive Join algorithms.
      1. Common Join -> Map Join
      2. Map Join -> Bucket Map Join
      3. Common Join -> Bucket Map Join
      4. Bucket Map Join ->  SMB Join
      5. Common Join -> Skew Join
  6. Walk the Optimized Optiq Calcite OP tree and introduce derived tables to convert OP tree to SQL.
    1. Generate unique table (including derived table) aliases
  7. Walk the OP tree and convert in to AST.
    1. Stash Join algorithm in AST as query Hints
  8. Modify Plan Generator to pay attention to Optiq Calcite query hints
  9. Rerun Hive optimizer and generate the execution plan (this second pass would not invoke Optiq Calcite optimizer).

Open Issues

  1. CBO needs to differentiate between types of IPC (Durable-Local-FS vs, Durable-HDFS, Memory vs. Streaming) 

...