Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Experiments

This section details the experiments using compressed storage. The experiments was done using two datasets: 1) SocialGen dataset and 2) Synthetic Tweets and was conducted using two type of hard drives (External HDD using USB 3.0 and SSD).

Configuration Setup:

  • OS: OSX 10.11.6 (El Capitan)
  • Memory: 16GB
  • Hard drives read/write peaks:
    • SSD (Read: ~715MB/s Write: ~640MB/s) 
    • HDD: HDD (Read: ~100MB/s Write: ~100MB/s)

AsterixDB Configuration:

  • Buffer Cachecache: 7GB
  • Buffer Cache Page Sizecache page size: 256KB
  • Memory component budget: 2GB
  • Memory component page size: 64KB
  • Max writable datasets: 2

Social Gen (Data Scan):

  • GleambookMessages raw size: 46GB 
  • Comparing: Uncompressed and Compressed with: Snappy,LZ4 and LZ4HC
  • Indexes: authorId (B-Tree)
  • Load: Bulkload
  • # of IODevices: 2
Data Loading Time:

Time took for bulkload (lower is better)

Image Added

On-disk size:

Data Scan execution time:

Using queryQuery: SELECT COUNT(*) FROM GleambookMessage

The query is executed 7 times and we dropped the first two.

  • SSD Result (lower is better)

  • HDD Result (lower is better)

Twitter (Secondary index queries)

  • Raw size: 50GB 
  • Comparing: Uncompressed and Compressed with: Snappy (referred as Compressed in the charts below)
  • Indexes: timestamp (B-Tree)
  • Load: Socket feed
  • # of IODevices: 1 (ONLY SSD)

This experiment is intended to show any impact from the compression on queries with very selective predicate.

Data Loading Time (lower is better):

Image Added

On-disk size:

Image Added

Data Scan execution time:
  1. Point lookups

Query: SELECT COUNT(*) FROM Tweets WHERE timestamp_ms = <TIMESTAMP>

  • Ordered Access:

We run the query with 3000 different timestamp in an increasing order. Each timestamp is corresponding to 1000 record.