...
Experiments
This section details the experiments using compressed storage. The experiments was done using two datasets: 1) SocialGen dataset and 2) Synthetic Tweets and was conducted using two type of hard drives (External HDD using USB 3.0 and SSD).
Configuration Setup:
- OS: OSX 10.11.6 (El Capitan)
- Memory: 16GB
- Hard drives read/write peaks:
- SSD (Read: ~715MB/s Write: ~640MB/s)
- HDD: HDD (Read: ~100MB/s Write: ~100MB/s)
AsterixDB Configuration:
- Buffer cache: 7GB
- Buffer cache page size: 256KB
- Memory component budget: 2GB
- Memory component page size: 64KB
- Max writable datasets: 2
Social Gen (Data Scan):
- GleambookMessages raw size: 46GB
- Comparing: Uncompressed and Compressed with: Snappy,LZ4 and LZ4HC
- Indexes: authorId (B-Tree)
- Load: Bulkload
- # of IODevices: 2
Data Loading Time:
Time took for bulkload (lower is better)
On-disk size:
Data Scan execution time:
Query: SELECT COUNT(*) FROM GleambookMessage
The query is executed 7 times and we dropped the first two.
- SSD Result (lower is better)
- HDD Result (lower is better)
Twitter (Secondary index queries)
- Raw size: 50GB
- Comparing: Uncompressed and Compressed with: Snappy (referred as Compressed in the charts below)
- Indexes: timestamp (B-Tree)
- Load: Socket feed
- # of IODevices: 1 (ONLY SSD)
This experiment is intended to show any impact from the compression on queries with very selective predicate.
Data Loading Time (lower is better):
On-disk size:
Data Scan execution time:
Point
lookupsLookup
Query: SELECT COUNT(*) FROM Tweets WHERE timestamp_ms = <TIMESTAMP>
- Ordered Access:
We run the query with 3000 different timestamp in an increasing order (timestamp1 < timestamp2).
- Each timestamp is corresponding to 1000 record.
- Each timestamp correspond to one record
- Random Access
We run the query with 500 different timestamp in a random order (the timestamps are randomly shuffled).
- Each timestamp is corresponding to 1000 record.
- Each timestamp is corresponding to one record.