Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Describe typical test composition. List some useful practices.

...

Note
titleTODO

This section requires some definitive guidance.

Tools

...

and frameworks

When constructing tests it is helpful to have a framework that simplifies the declaration and execution of tests. Typically these tools allow the specification of many of the following:

  • Execution environment configuration: usually hiveconf and hivevar parameters.
  • Declaring input test data: creating or selecting files that back some source tables.
  • Definition of the executable component of the test: normally the HQL script under test.
  • Expectations: These can be in the form of a reference data file or alternatively fine grained assertions can be made with further queries.

The precise details are of course framework specific, but generally speaking tools manage the full lifecycle of tests by composing the artifacts provided by the developer into a sequence such as:

  1. Configure Hive execution environment.
  2. Setup test input data.
  3. Execute HQL script under test.
  4. Extract data written by the executed script.
  5. Make assertions on the data extracted.

At this time there are are a number of concrete approaches to choose from:The following tools provide test harnesses for the local execution of Hive tests.

  • HiveRunner: Test cases are declared using Java, HQL and JUnit and can execute locally in your IDE. This library focuses on ease of use and execution speed. No local Hive/Hadoop installation required. Full test isolations and seamless UDF integration (they need only be on the project classpath).
  • beetest: Test cases are declared using HQL and 'expected' data files. Test suites are executed using a script on the command line.
  • hive_test: Test cases are declared using Java, HQL and JUnit and can execute locally in your IDE.
  • How to utilise the Hive project's internal test framework

...

  • .

Useful practices

  • Modularise large or complex queries into multiple smaller components. These are easier to comprehend, maintain, and test.
  • Use macros or UDFs to encapsulate repeated or complex column expressions.
  • Use Hive variables to decouple HQL scripts from specific environments. For example it might be wise to use LOCATION ${myTableLocation} in preference to LOCATION /hard/coded/path.
  • Keep the scope of tests small. Making coarse assertions on the entire contents of a table is brittle and has a high maintenance requirement.
  • Use the SOURCE command to compose multiple smaller HQL scripts.