You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Motivation

As the data size increase, the IO becomes the main bottleneck for many of the analytical queries. To address this issue, we introduce block-compression at the storage level. 

Design

This section details the added components to AsterixDB.

Compressor/Decompressor API:

Hyracks now exposes two interfaces ICompressorDecompressorFactory and ICompressorDecompressor. Hyracks offers three built-in compression schemes: Snappy, LZ4 and LZ4HC (High Compression). The compressor/decompressor API is targeted to compress/decompress any byte arrays. It's not only for storage but also could be adopted anywhere (such as network communication). ICompressorDecompressor has to be stateless to be thread-safe.

Storage Compression

Currently AsterixDB compresses only the primary index. We can extend it to include the secondary indexes as well. 

Look Aside File (LAF):

In AsterixDB, indexes are referenced and stored on-disk pages of a fixed size in a file. This offers a deterministic way of accessing any arbitrary pagei (offset = i * pageSize). However, the compressor produces pages with variable sizes and we need a way to get the proper offset and the size for the required pagei. Therefore, each compressed file (index) has a companion file (called Look Aside File or LAF for short). The LAF file consists of multiple entries each of which stores the offset and the size of a page (entryi corresponds to the pagei), see Figure 1. Each entry has two 64-bit integer pair <offset,size>. 

 

Figure 1: Compressed file of n pages

  • No labels