Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Motivation

As the data size increase, the IO becomes the main bottleneck for many of the analytical queries. To address this issue, we introduce block-compression at the storage level. 

User Model

There are two ways to enable the compression:

  1. DDL:
    During creating a dataset the user can specify the compression-scheme as in the following example
    CREATE DATASET Compressed(CompressedType)
    PRIMARY KEY id
    hints(with {"dataset-compression-scheme"=: "snappy")};

  2. Configuration File:
    Under [nc] in the configuration file, the user can specify the compression scheme which would be applied to all datasets. The user can override it by specifying a new value using DDL as in (1).
    Example:
    [nc]
    storage.dataset.compression.scheme=snappy


Supported compression schemes: 

LZ4 high
Configuration ValueDescription
noneNo compression
snappySnappy compression
lz4

LZ4 compression

lz4hc compression


Design

This section details the added components to AsterixDB.

Compressor/Decompressor API:

Two interfaces are introduced at the Hyracks level: ICompressorDecompressorFactory and ICompressorDecompressor. Hyracks offers three built-in compression schemes: Snappy, LZ4 and LZ4HC (LZ4 with High Compression). The compressor/decompressor API is targeted to compress/decompress any byte arrays. It's not only for storage but also could be adopted elsewhere (such as network exchange). 

Contract: ICompressorDecompressor has to be stateless to be thread-safe.

Storage Compression

Currently AsterixDB compresses only the primary index. We can extend it to include the secondary indexes as well. 

Look Aside File (LAF):

In AsterixDB, indexes are referenced and stored on-disk pages of a fixed size in a file. This offers a deterministic way of accessing any arbitrary pagei (offset = i * pageSize). However, the compressor produces pages with variable sizes and we need a way to get the proper offset and the size for the required pagei. Therefore, each compressed file (index) has a companion file (called Look Aside File or LAF for short). The LAF file consists of multiple entries each of which stores the offset and the size of a page (entryi corresponds to the pagei), see Figure 1. Each entry has two 64-bit integer pair <offset, size>. 

The entry of the compressed pagei is:

1- At LAF-page: i * ENTRY_LENGTH / pageSize

2- In the LAF-page, the entry offset is at: i * ENTRY_LENGTH % pageSize

  • where ENTRY_LENGTH = 16

Note that we can infer the size of a compressed page by looking to the next offset. However, the last entry in a LAF page requires to read the next page of the LAF file to determine the size of the compressed page. To make that simple, we store the compressed page size as well.

 

Figure 1: Compressed file of n pages

Ingestion Workflow:

Before a page is queued for write, the "page-owner" is responsible to prepare the CompressedFileWriter first. This is needed to confiscate the required page(s) of the LAF file before the write can happen (see Figure 2). CompressedFileWriter can be obtained by calling IBufferCache.getCompressedFileWriter(fileId). In addition to the bulkloader, it is also required for the MetadataPageManager to prepare the CompressedFileWriter before writing the metadata page (during marking the index as valid). The BufferCache will sync the LAF file before the index file to make sure everything is synced to disk.

During writing a page, the BufferCache may find that the compression did not save any space. Therefore, it will write the page as is without compression. Currently, we implement a naïve policy which requires at least 1-byte of saving. 

Figure 2: Ingestion workflow for compressed indexes


Query Workflow:

During read operations, the whole workflow is managed by the buffer cache, Figure 3. Reading a compressed page may incur two IO operations (reads). One for the LAF page and one for the compressed pages. However, the LAF file is usually small (few pages) and mostly will be cached during sequential reads.

Figure 3: Read compressed page

Additional Compress/Uncompress Buffers:

For compress/decompress operations, the buffer cache utilizes the buffer of BufferCacheHeaderHelper. Sometimes, it may increase the header buffer size as needed by the compressor (this can be determined by calling ICompressorDecompressor.computeCompressBufferSize(uncompressedBufferSize)). That means the compressor/decompressor does not allocate any additional buffers for each read/write operations.

Large Pages:

During reading/writing large pages, the BufferCache will decompose the large page into multiple pages and process each page separately to cap the size of the compressor/decompressor buffers.

...