Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Expanded 'practices' section.

...

  • HiveRunner: Test cases are declared using Java, HQL and JUnit and can execute locally in your IDE. This library focuses on ease of use and execution speed. No local Hive/Hadoop installation required. Full test isolation, fine grained assertions, and seamless UDF integration (they need only be on the project classpath).
  • beetest: Test cases are declared using HQL and 'expected' data files. Test suites are executed using a script on the command line.
  • hive_test: Test cases are declared using Java, HQL and JUnit and can execute locally in your IDE.
  • How to utilise the Hive project's internal test framework.

Useful practices

The following Hive specific practices can be used to make processes more amenable to unit testing and assist in the simplification of individual tests.

  • Modularise large or complex queries into multiple smaller components. These are easier to comprehend, maintain, and test.
  • Use macros or UDFs to encapsulate repeated or complex column expressions.
  • Use Hive variables to decouple HQL scripts from specific environments. For example it might be wise to use LOCATION ${myTableLocation} in preference to LOCATION /hard/coded/path.
  • Keep the scope of tests small. Making coarse assertions on the entire contents of a table is brittle and has a high maintenance requirement.
  • Use the SOURCE command to combine multiple smaller HQL scripts.
  • Test macros and the integration of UDFs by creating simple test tables and applying the functions to columns in those tables.
  • Test UDFs by invoking the lifecycle methods directly (initialize, evaluate, etc.) in a standard testing framework such as JUnit.