Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: clean up page

Long Live and Process (LLAP) functionality was added in Hive 2.0 (HIVE-7926 and associated tasks).

LLAP Design DocumentFor configuration of LLAP, see LLAP Section of Configuration Properties..

Hive has become significantly faster thanks to various features and improvements that were built by the community over the past two years, including Tez and Cost-based-optimization.

Keeping the momentum, here are some examples of what we think will take us to the next level:

  • Asynchronous spindle-aware IO
  • Pre-fetching and caching of column chunks
  • Multi-threaded JIT-friendly operator pipelines

In order to achieve this we are proposing a hybrid execution model which consists of a long-lived daemon replacing direct interactions with the HDFS DataNode and a tightly integrated DAG-based framework.
Functionality such as caching, pre-fetching, some query processing and access control will move into the daemon.

Small/short queries can be largely processed by this daemon directly, while any heavy lifting will be performed in standard YARN containers.

Similar to the DataNode, LLAP daemons can be used by other applications as well, especially if a relational view on the data is preferred over file-centric processing.

We’re thus planning to open the daemon up through optional APIs (e.g.: InputFormat) that can be leveraged by other data processing frameworks as a building block.

Last, but not least, fine-grained column-level access control -- a key requirement for mainstream adoption of Hive -- fits nicely into this model.

 
 

Persistent daemon

Execution Engine

Query Fragment Execution

I/O

Caching

Workload Management

ACID Support

Security

Resources

LLAP Design Document

Hive Contributor Meetup Presentation