Experiments

This section details the experiments using compressed storage. The experiments was done using two datasets: 1) SocialGen dataset and 2) Synthetic Tweets and was conducted using two type of hard drives (External HDD using USB 3.0 and SSD).

Configuration Setup:

OS: OSX 10.11.6 (El Capitan)
Memory: 16GB
Hard drives read/write peaks:
- SSD (Read: ~715MB/s Write: ~640MB/s)
- HDD: HDD (Read: ~100MB/s Write: ~100MB/s)

AsterixDB Configuration:

Buffer Cachecache: 7GB
Buffer Cache Page Sizecache page size: 256KB
Memory component budget: 2GB
Memory component page size: 64KB
Max writable datasets: 2

Social Gen (Data Scan):

GleambookMessages raw size: 46GB
Comparing: Uncompressed and Compressed with: Snappy,LZ4 and LZ4HC
Indexes: authorId (B-Tree)
Load: Bulkload
# of IODevices: 2

Data Loading Time:

Time took for bulkload (lower is better)

Image Added

On-disk size:

Data Scan execution time:

Using queryQuery: SELECT COUNT(*) FROM GleambookMessage

The query is executed 7 times and we dropped the first two.

SSD Result (lower is better)

HDD Result (lower is better)

Twitter (Secondary index queries)

Raw size: 50GB
Comparing: Uncompressed and Compressed with: Snappy (referred as Compressed in the charts below)
Indexes: timestamp (B-Tree)
Load: Socket feed
# of IODevices: 1 (ONLY SSD)

This experiment is intended to show any impact from the compression on queries with very selective predicate.

Data Loading Time (lower is better):

Image Added

On-disk size:

Image Added

Data Scan execution time:

Point lookups

Query: SELECT COUNT(*) FROM Tweets WHERE timestamp_ms = <TIMESTAMP>

Ordered Access:

We run the query with 3000 different timestamp in an increasing order. Each timestamp is corresponding to 1000 record.

Page tree

Versions Compared

Old Version 5

New Version 6

Key

Experiments

Configuration Setup:

AsterixDB Configuration:

Social Gen (Data Scan):

Data Loading Time:

On-disk size:

Data Scan execution time:

Twitter (Secondary index queries)

Data Loading Time (lower is better):

On-disk size:

Data Scan execution time:

Page tree

Page History

Versions Compared

Old Version 5

New Version 6

Key

Experiments

Configuration Setup:

AsterixDB Configuration:

Social Gen (Data Scan):

Data Loading Time:

On-disk size:

Data Scan execution time:

Twitter (Secondary index queries)

Data Loading Time (lower is better):

On-disk size:

Data Scan execution time: