Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Parse and SemanticAnalysis (ql/parse) - This component contains the code for parsing SQL, converting it into Abstract Syntax Trees, converting the Abstract Syntax Trees into Operator Plans and finally converting the operator plans into a directed graph of tasks which are executed by Driver.java.
  • Optimizer (ql/optimizer) - This component contains some simple rule based optimizations like pruning non referenced columns from table scans (column pruning) that the Hive Query Processor does while converting SQL to a series of map/reduce tasks.
  • Plan Components (ql/plan) - This component contains the classes (which are called descriptors), that are used by the compiler (Parser, SemanticAnalysis and Optimizer) to pass the information to operator trees that is used by the execution code.
  • MetaData Layer (ql/metadata) - This component is used by the query processor to interface with the MetaStore in order to retrieve information about tables, partitions and the columns of the table. This information is used by the compiler to compile SQL to a series of map/reduce tasks.
  • Map/Reduce Execution Engine (ql/exec) - This component contains all the query operators and the framework that is used to invoke those operators from within the map/reduces tasks.
  • Hadoop Record Readers, Input and Output Formatters for Hive (ql/io) - This component contains the record readers and the input, output formatters that Hive registers with a Hadoop Job.
  • Sessions (ql/session) - A rudimentary session implementation for Hive.
  • Type interfaces (ql/typeinfo) - This component provides all the type information for table columns that is retrieved from the MetaStore and the SerDes.
  • Hive Function Framework (ql/udf) - Framework and implementation of Hive operators, Functions and Aggregate Functions. This component also contains the interfaces that a user can implement to create user defined functions.
  • Tools (ql/tools) - Some simple tools provided by the query processing framework. Currently, this component contains the implementation of the lineage tool that can parse the query and show the source and destination tables of the query.

A helpful overview of the Hive query processor can be found in this Hive Anatomy slide deck.

Compiler

Parser

TypeChecking

...

Plan

Operators

UDFs and UDAFs

A helpful overview of the Hive query processor can be found in this Hive Anatomy slide deck.

Compiling and Running Hive

...

Run hive from the command line with '$HIVE_HOME/bin/hive', where $HIVE_HOME is typically build/dist under your hive repository top-level directory.

...

If hive fails at runtime, try $ 'ant very-clean package' to delete the ivy cache before rebuilding.

...

Code Block
export HIVE_OPTS='-hiveconf mapred.job.tracker=local -hiveconf fs.default.name=file:///tmp \
    -hiveconf hive.metastore.warehouse.dir=file:///tmp/warehouse \
    -hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/tmp/metastore_db;create=true'

Then you can run 'build/dist/bin/hive' and it will work against your local file system.

...