Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The initial implementation was added to Hive 0.14 in HIVE-7068 and is designed to work with Accumulo 1.6.x. There are two main components which make up the implementation: the AccumuloStorageHandler and the AccumuloPredicateHandler. The AccumuloStorageHandler is a StorageHandler implementation. The primary roles of this class are to manage the mapping of Hive table to Accumulo table and configures Hive queries. The AccumuloPredicateHandler is used push down filter operations to the Accumulo for more efficient reduction of data.

Accumulo Configuration

The only additional Accumulo configuration necessary is the inclusion of the accumulo-storage-handler.jar, provided as a part of the Hive distribution, to be included in the Accumulo server classpath. This can be accomplished a variety of ways: copying/symlink the jar into $ACCUMULO_HOME/lib or $ACCUMULO_HOME/lib/ext or include the path to the jar in general.classpaths in accumulo-site.xml. Be sure to restart the Accumulo tabletservers if the jar is added to the classpath in a non-dynamic fashion (using $ACCUMULO_HOME/lib or general.classpaths in accumulo-site.xml).

Usage

To issue queries against Accumulo using Hive, four parameters must be provided by the Hive configuration:

...

No Format
CREATE EXTERNAL TABLE countries(key string, name string, country string, country_id int)
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES ("accumulo.columns.mapping" = ":rowID,info:name,info:country,info:country_id");

 

...

Acknowledgements

I would be remiss to not mention the efforts made by Brian Femiano that were the basis for this storage handler. His initial prototype for Accumulo-Hive integration was the base for this work.