Experiments

This section details the experiments using compressed storage. The experiments was done using two datasets: 1) SocialGen dataset and 2) Synthetic Tweets and was conducted using two type of hard drives (External HDD using USB 3.0 and SSD).

Configuration Setup:

OS: OSX 10.11.6 (El Capitan)
Memory: 16GB
Hard drives read/write peaks:
- SSD (Read: ~715MB/s Write: ~640MB/s)
- HDD: HDD (Read: ~100MB/s Write: ~100MB/s)

AsterixDB Configuration:

Buffer cache: 7GB
Buffer cache page size: 256KB
Memory component budget: 2GB
Memory component page size: 64KB
Max writable datasets: 2

Social Gen (Data Scan):

GleambookMessages raw size: 46GB
Comparing: Uncompressed and Compressed with: Snappy,LZ4 and LZ4HC
Indexes: authorId (B-Tree)
Load: Bulkload
# of IODevices: 2

Data Loading Time:

Time took for bulkload (lower is better)

On-disk size:

Data Scan execution time:

Query: SELECT COUNT(*) FROM GleambookMessage

The query is executed 7 times and we dropped the first two.

SSD Result (lower is better)

HDD Result (lower is better)

Twitter (Secondary index queries)

Raw size: 50GB
Comparing: Uncompressed and Compressed with: Snappy (referred as Compressed in the charts below)
Indexes: timestamp (B-Tree)
Load: Socket feed
# of IODevices: 1 (ONLY SSD)

This experiment is intended to show any impact from the compression on queries with very selective predicate.

Data Loading Time (lower is better):

On-disk size:

Data Scan execution time:

Point

lookups

Lookup

Query: SELECT COUNT(*) FROM Tweets WHERE timestamp_ms = <TIMESTAMP>

- Ordered Access:

We run the query with 3000 different timestamp in an increasing order (timestamp₁ < timestamp₂).

- - Each timestamp is corresponding to 1000 record.

Image Added

- - Each timestamp correspond to one record

Image Added

- Random Access

We run the query with 500 different timestamp in a random order (the timestamps are randomly shuffled).

- - Each timestamp is corresponding to 1000 record.

Image Added

- Each timestamp is corresponding to one record.

Image Added

Page tree

Versions Compared

Old Version 6

New Version 7

Key

Experiments

Configuration Setup:

AsterixDB Configuration:

Social Gen (Data Scan):

Data Loading Time:

On-disk size:

Data Scan execution time:

Twitter (Secondary index queries)

Data Loading Time (lower is better):

On-disk size:

Data Scan execution time:

Page tree

Page History

Versions Compared

Old Version 6

New Version 7

Key

Experiments

Configuration Setup:

AsterixDB Configuration:

Social Gen (Data Scan):

Data Loading Time:

On-disk size:

Data Scan execution time:

Twitter (Secondary index queries)

Data Loading Time (lower is better):

On-disk size:

Data Scan execution time: