Table of Contents

Motivation

As the data size increase, the IO becomes the main bottleneck for many of the analytical queries. To address this issue, we introduce block-compression at the storage level.

User Model

There are two ways to enable the compression:

DDL:
During creating a dataset the user can specify the compression-scheme as in the following example
CREATE DATASET Compressed(CompressedType)
PRIMARY KEY id
hints(with {"dataset-compression-scheme"=: "snappy")};
Configuration File:
Under [nc] in the configuration file, the user can specify the compression scheme which would be applied to all datasets. The user can override it by specifying a new value using DDL as in (1).
Example:
[nc]
storage.dataset.compression.scheme=snappy

Supported compression schemes:

LZ4 high

Configuration Value	Description
none	No compression
snappy	Snappy compression
lz4	LZ4 compression
lz4hc	compression

Design

This section details the added components to AsterixDB.

Compressor/Decompressor API:

Two interfaces are introduced at the Hyracks level: ICompressorDecompressorFactory and ICompressorDecompressor. Hyracks offers three built-in compression schemes: Snappy, LZ4 and LZ4HC (LZ4 with High Compression). The compressor/decompressor API is targeted to compress/decompress any byte arrays. It's not only for storage but also could be adopted elsewhere (such as network exchange).

Contract: ICompressorDecompressor has to be stateless to be thread-safe.

Storage Compression

Currently AsterixDB compresses only the primary index. We can extend it to include the secondary indexes as well.

Look Aside File (LAF):

In AsterixDB, indexes are referenced and stored on-disk pages of a fixed size in a file. This offers a deterministic way of accessing any arbitrary page_i (offset = i * pageSize). However, the compressor produces pages with variable sizes and we need a way to get the proper offset and the size for the required page_i. Therefore, each compressed file (index) has a companion file (called Look Aside File or LAF for short). The LAF file consists of multiple entries each of which stores the offset and the size of a page (entry_i corresponds to the page_i), see Figure 1. Each entry has two 64-bit integer pair <offset, size>.

The entry of the compressed page_i is:

1- At LAF-page: i * ENTRY_LENGTH / pageSize

2- In the LAF-page, the entry offset is at: i * ENTRY_LENGTH % pageSize

where ENTRY_LENGTH = 16

Note that we can infer the size of a compressed page by looking to the next offset. However, the last entry in a LAF page requires to read the next page of the LAF file to determine the size of the compressed page. To make that simple, we store the compressed page size as well.

Figure 1: Compressed file of n pages

Ingestion Workflow:

Before a page is queued for write, the "page-owner" is responsible to prepare the CompressedFileWriter first. This is needed to confiscate the required page(s) of the LAF file before the write can happen (see Figure 2). CompressedFileWriter can be obtained by calling IBufferCache.getCompressedFileWriter(fileId). In addition to the bulkloader, it is also required for the MetadataPageManager to prepare the CompressedFileWriter before writing the metadata page (during marking the index as valid). The BufferCache will sync the LAF file before the index file to make sure everything is synced to disk.

During writing a page, the BufferCache may find that the compression did not save any space. Therefore, it will write the page as is without compression. Currently, we implement a naïve policy which requires at least 1-byte of saving.

Figure 2: Ingestion workflow for compressed indexes

Query Workflow:

During read operations, the whole workflow is managed by the buffer cache, Figure 3. Reading a compressed page may incur two IO operations (reads). One for the LAF page and one for the compressed pages. However, the LAF file is usually small (few pages) and mostly will be cached during sequential reads.

Figure 3: Read compressed page

Additional Compress/Uncompress Buffers:

For compress/decompress operations, the buffer cache utilizes the buffer of BufferCacheHeaderHelper. Sometimes, it may increase the header buffer size as needed by the compressor (this can be determined by calling ICompressorDecompressor.computeCompressBufferSize(uncompressedBufferSize)). That means the compressor/decompressor does not allocate any additional buffers for each read/write operations.

Large Pages:

During reading/writing large pages, the BufferCache will decompose the large page into multiple pages and process each page separately to cap the size of the compressor/decompressor buffers.

...

Page tree

Versions Compared

Old Version 10

New Version 11

Key

Motivation

User Model