LLAP

Long Live and Process (LLAP) functionality was added in Hive 2.0 (HIVE-7926 and associated tasks).

Hive has become significantly faster thanks to various features and improvements that were built by the community over the past two years, including Tez and Cost-based-optimization.

Keeping the momentum, here are some examples of what we think will take us to the next level:

Asynchronous spindle-aware IO
Pre-fetching and caching of column chunks
Multi-threaded JIT-friendly operator pipelines

In order to achieve this we are proposing a hybrid execution model which consists of a long-lived daemon replacing direct interactions with the HDFS DataNode and a tightly integrated DAG-based framework.
Functionality such as caching, pre-fetching, some query processing and access control will move into the daemon.
Small/short queries can be largely processed by this daemon directly, while any heavy lifting will be performed in standard YARN containers.

Similar to the DataNode, LLAP daemons can be used by other applications as well, especially if a relational view on the data is preferred over file-centric processing.

We’re thus planning to open the daemon up through optional APIs (e.g.: InputFormat) that can be leveraged by other data processing frameworks as a building block.

Last, but not least, fine-grained column-level access control -- a key requirement for mainstream adoption of Hive -- fits nicely into this model.

Space shortcuts

Child pages