Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

[1] https://docs.microsoft.com/en-us/sql/t-sql/functions/compress-transact-sql?view=sql-server-2017
[2] https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_compress
[3] https://www.percona.com/doc/percona-server/LATEST/flexibility/compressed_columns.html
[
4] https://mariadb.com/kb/en/library/storage-engine-independent-column-compression/

Columnar Store

Usually data is stored in row format, when all attributes are stored together. Alternative approach is to store data column-wise. In this case all values of a specific column for a set of rows are stored near each other. This allows to improve compression rates dramatically, up to 10x. This approach saves a lot of spaces and improves scan speed, especially in OLAP cases. However, it suffers from read amplification for row lookups and write amplification for updates. Hence, it is usually used for cold data. 

Advantages:

  • Dramatic disk savings - up to 10x-15x on typical historical data sets
  • Improved scan speed

Disadvantages:

  • Very slow lookups
  • Very slow updates

Examples:

  • Oracle Exadata Hybrid Columnar Compression - oragnize rows into groups, then transform them to columnar format and applies compression [1]
    SQL Server Columnsotre Indexes - very similar to Oracle's idea [2] [3] [4]

[1] http://www.oracle.com/technetwork/database/features/availability/311358-132337.pdf
[2] https://msdn.microsoft.com/en-us/library/gg492088(v=sql.120).aspx
[3] https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-overview?view=sql-server-2017
[4] https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/data-compression?view=sql-server-2017#using-columnstore-and-columnstore-archive-compression

 

 

Draft materials

Compression options:
51) Column storeMISC
102) New algorithms (LSM, BWTree, ...)

...

3) Advanced LOB compression and deduplication - appears to be something similar to PG TOAST (?)?? 6) Hybrid columnar compression (query level, archive level) - tremendous compression rates (up to 50x), 5-15x typical, "query" - improves scan performance, "archive" - for data archives

Best practices

...