Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Metastore Server - This is the thrift server (interface defined in metastore/if/hive_metastore.if) that services metadata requests from clients. It delegates most of the requests underlying meta data store and the Hadoop file system which contains data.
  • Object Store - ObjectStore class handles access to the actual metadata is stored in the SQL store. The current implementation uses JPOX ORM solution which is based of JDA specification. It can be used with any database that is supported by JPOX. New meta stores (file based or xml based) can added by implementing the interface MetaStore. FileStore is a partial implementation of an older version of metastore which may be deprecated soon.
  • Metastore Client - There are python, java, php thrift clients in metastore/src. Java generated client is extended with HiveMetaStoreClient which is used by Query Processor (ql/metadta). This is the main interface to all other Hive components.

Query Processor

The following are the main components of the Hive Query Processor:

  • Parse and SemanticAnalysis (ql/parse) - This component contains the code for parsing SQL, converting it into Abstract Syntax Trees, converting the Abstract Syntax Trees into Operator Plans and finally converting the operator plans into a directed graph of tasks which are executed by Driver.java.
  • Optimizer (ql/optimizer) - This component contains some simple rule based optimizations like pruning non referenced columns from table scans (column pruning) that the Hive Query Processor does while converting SQL to a series of map/reduce tasks.
  • Plan Components (ql/plan) - This component contains the classes (which are called descriptors), that are used by the compiler (Parser, SemanticAnalysis and Optimizer) to pass the information to operator trees that is used by the execution code.
  • MetaData Layer (ql/metadata) - This component is used by the query processor to interface with the MetaStore in order to retrieve information about tables, partitions and the columns of the table. This information is used by the compiler to compile SQL to a series of map/reduce tasks.
  • Map/Reduce Execution Engine (ql/exec) - This component contains all the query operators and the framework that is used to invoke those operators from within the map/reduces tasks.
  • Hadoop Record Readers, Input and Output Formatters for Hive (ql/io) - This component contains the record readers and the input, output formatters that Hive registers with a Hadoop Job.
  • Sessions (ql/session) - A rudimentary session implementation for Hive.
  • Type interfaces (ql/typeinfo) - This component provides all the type information for table columns that is retrieved from the MetaStore and the SerDes.
  • Hive Function Framework (ql/udf) - Framework and implementation of Hive operators, Functions and Aggregate Functions. This component also contains the interfaces that a user can implement to create user defined functions.
  • Tools (ql/tools) - Some simple tools provided by the query processing framework. Currently, this component contains the implementation of the lineage tool that can parse the query and show the source and destination tables of the query.

A helpful overview of the Hive query processor can be found in this Hive Anatomy slide deck.

Compiler

Parser

TypeChecking

Semantic Analysis

...

Code Block
$ build/dist/bin/hive

If hive fails at runtime, try $ ant very-clean package to delete the ivy cache before rebuilding.

Running Hive Without a Hadoop Cluster

...

Run all tests:

Code Block
ant package test

Run all positive test queries:

...

Code Block
ant test -Dtestcase=TestCliDriver -Dqfile=groupby1.q

The about above test produces the following files:

  • build/ql/test/TEST-org.apache.hadoop.hive.cli.TestCliDriver.txt - Log output for the test. This can be helpful when examining test failures.
  • build/ql/test/logs/groupby1.q.out - Actual query result for the test. This result is compared to the expected result as part of the test.

Apparently the hive tests do not run successfully after a clean unless you run ant package first. Not sure why build.xml doesn't encode this dependency.

Adding new unit tests

First, write a new myname.q in ql/src/test/queries/clientpositive

Then, run the test with the query and overwrite the result (useful when you add a new test)

...