Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: spacing

...

  • Defining boundaries between components: How can and how should a problem be decomposed into smaller, testable units.
  •  Harness provision: Providing a local execution environment that seamlessly supports Hive’s features in a local IDE setting (UDFs etc.)
  •  Speed of execution: The goal is to have large numbers of isolated, small tests. Test isolation requires frequent setup and teardown and the costs incurred are multiplied the number of tests.

 

...

Modularisation

...

By modularising processes implemented using Hive they become easier to test effectively and more resilient to change. Although Hive provides a number of vectors for modularisation it is not always clear how a large process can be decomposed. Features for encapsulation of query logic into components is separated into two perpendicular concerns: column level logic, and set level logic. Column level logic refers to the expressions applied to individual columns or groups of columns in the query, commonly described as ‘functions’. Set level logic concerns HQL constructs that manipulate groupings of data such as: column projection with SELECT, GROUP BY aggregates, JOINs, ORDER BY sorting, etc. In either case we expect individual components to live in their own source file or deployable artifact and imported as needed by the composition. For HQL based components, the SOURCE command provides this functionality.

...