You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Hive Accumulo Integration

Overview

Apache Accumulo is a sorted, distributed key-value store based on the Google BigTable paper. The API methods that Accumulo provides are in terms of Keys and Values which present the highest level of flexibility in reading and writing data; however, higher-level query abstractions are typically an exercise left to the user. Leveraging Apache Hive as a SQL interface to Accumulo complements its existing high-throughput batch access and low-latency random lookups.

Implementation

The initial implementation was added to Hive 0.14 in HIVE-7068 and is designed to work with Accumulo 1.6.x. There are two main components which make up the implementation: the AccumuloStorageHandler and the AccumuloPredicateHandler. The AccumuloStorageHandler is a StorageHandler implementation. The primary roles of this class are to manage the mapping of Hive table to Accumulo table and configures Hive queries. The AccumuloPredicateHandler is used push down filter operations to the Accumulo for more efficient reduction of data.

Usage

To issue queries against Accumulo using Hive, four parameters must be provided by the Hive configuration:

  • accumulo.instance.name
  • accumulo.zookeepers
  • accumulo.user.name
  • accumulo.user.pass

 

  • No labels