Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Cleaned up overview section for LLAP

Live Long Live and Process (LLAP) functionality was added in Hive 2.0 (HIVE-7926 and associated tasks).   HIVE-9850 links documentation, features and issues for this enhancement.

For configuration of LLAP, see the LLAP Section of Configuration Properties.

Overview

Hive has become significantly faster thanks to various features and improvements that were built by the community

...

in recent years, including Tez and Cost-based-optimization.

...

The following were needed to take Hive to the next level:

  • Asynchronous spindle-aware IO
  • Pre-fetching and caching of column chunks
  • Multi-threaded JIT-friendly operator pipelines

...

LLAP provides a hybrid execution model which consists of a long-lived daemon replacing direct interactions with the HDFS DataNode and a tightly integrated DAG-based framework.

...

Functionality such as caching, pre-fetching, some query processing and access control

...

are moved into the daemon.

...

Small/short queries

...

are largely processed by this daemon directly, while any heavy lifting will be performed in standard YARN containers.

Similar to the DataNode, LLAP daemons can be used by other applications as well, especially if a relational view on the data is preferred over file-centric processing.

...

The daemon is also open through optional APIs (e.g.: InputFormat) that can be leveraged by other data processing frameworks as a building block.

Last, but not least, fine-grained column-level access control -- a key requirement for mainstream adoption of Hive -- fits nicely into this model.

...

Image Removed

...

The diagram below shows an example execution with #LLAP. Tez AM orchestrates overall execution.

...

The initial stage of query is pushed into #LLAP, and large shuffle is performed in their own containers. Multiple queries and applications can access #LLAP concurrently.

...

Image Added

Persistent daemon

To facilitate caching, JIT optimization and to eliminate most of the startup costs, we will run a daemon on the worker nodes on the cluster. The daemon will handle I/O, caching, and query fragment execution.

...

Hive Contributor Meetup Presentation

Try Hive LLAP

 

 

 

 

 

Save

Save