...

ID

IEP-20

Author

Vladimir Ozerov Ozerov

Sponsor

Vladimir Ozerov Ozerov

Created

25 Apr 2018

Status


colour	Grey
title	DRAFT

Table of Contents

Draft materials

Compression options:

1) FS

2) Sparse files

3) Prefix compression (indexes)

4) Better format (varlen, common header, null fields)

5) Column store

6) Block compression

7) Per-column compression

8) Row compression

9) WAL compression

10) New algorithms (LSM, BWTree, ...)

https://www.percona.com/blog/2013/07/09/how-tokumx-gets-great-compression-for-mongodb/

https://github.com/facebook/mysql-5.6/wiki/MyRocks-advantages-over-InnoDB

------------------------------

Postgres:

1) TOAST: https://www.postgresql.org/docs/10/static/storage-toast.html

2) WAL: https://www.pgcon.org/2016/schedule/attachments/432_WAL-Reduction.pdf

Compress full pages
Remove holes in the page

3) Plans: https://habr.com/company/postgrespro/blog/337180/

------------------------------

MySQL:

...

Motivation

Compression is used extensively by all database vendors to reduce TCO and improve performance. Compression could be applied to different parts of the system to achieve the following goals:

Less writes to disk - less IO call, less disk space needed
Less reads from disk - less IO call, less RAM is needed to accommodate the same number of records.

Performance numbers of other vendors demonstrate that we could expect 2x-4x decrease in required disk space and >1.5x increase in throughput (ops/sec) on typical workloads.

Competitive Analysis

This section describes general compression approaches and their pros and cons. The following compression mechanisms are implemented in practice:

Data Format
Index Prefix Compression
Page Compression
WAL compression
Column Compression
Columnar Store
Row Compression with Dictionary
Misc approaches

Data size could be reduced with efficient data format. Common row metadata could be skipped. Numeric values could be encoded to occupy less space. Fixed-length strings could be trimmed. NULL and zero values could be skipped with help of small row bitmaps.

Next compression could be applied to specific columns, either manually or transparently.

Data pages could be compressed on per-page basis, what gives 2x-4x compression rate on typical workloads. Depending on concrete implementation pages could be stored in memory in compressed or uncompressed forms. File system specific features, such as hole punching, could be applied to reduce IO.

Indexes are compressed differently because it is necessary to keep index page structure for fast lookups. Prefix compression is a common mechanism.

Extreme compression rates up to 10x-15x are possible with column store formats. But it is only applicable to historical data with minimal update rate and is not suitable for lookups.

WAL could be compressed to decrease log size. In some cases it may improve overall system throughput.

Last, compression could be applied to special cases. Examples are compression of large values (LOBs, long strings) and per-page dictionary compression during data load.

Data Format

Efficient disk usage starts with proper data layout. Vendors strive to place data in pages in such a way that total overhead is kept as low as possible while still maintaining high read speed. Typically this is achieved as follows:

Common metadata is stored outside of data page
Numeric types are written using varlen encoding (e.g. int data type may take 1-5 bytes instead of 4)
Fixed-length string data types (CHAR, NCHAR) are trimmed
NULL and zero values are optimized to consume no space, typically with special bitmap (e.g. if there are

Examples:

SQL Server row format [1] - varlen, CHAR trimming, NULL/zero optimization
MySQL row format [2] - no varlen, no CHAR trimming, NULL/zero optimization

[1] https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/row-compression-implementation?view=sql-server-2017
[2] https://dev.mysql.com/doc/refman/5.6/en/innodb-physical-record.html

Index Prefix Compression

Secondary indexes tend to have entries with common prefix. E.g. {'IndexedValue', link1}, {'indexedValue', link2}. Prefix compression technique extracts common prefixes from entries on the index page and place them in a special directory within a page.

Prefix compression could be applied to the following index types:

Non-unique single column secondary index
Non-unique and unique multi-column secondary index

Prefix compression could be implemented as follows:

Static - compression is applied to all index rows irrespective of whether it is beneficial or not. Attributes with low cardinality are compressed well. Contrary, attributes with high cardinality may have negative compression rates. Decision whether to compress or not is delegated to user (typically DBA)
Dynamic - compression is either applied or not applied to indexed values on page-by-page basis based on some internal heuristics. Negative compression rates are avoided automatically, but implementation is more complex.

Examples:

Oracle index compression (static) [1]
Oracle advanced index compression (dynamic) [2]
MongoDB index compression [3]

[1] https://blogs.oracle.com/dbstorage/compressing-your-indexes:-index-key-compression-part-1
[2] https://blogs.oracle.com/dbstorage/compressing-your-indexes:-advanced-index-compression-part-2
[3] https://docs.mongodb.com/manual/core/wiredtiger/#storage-wiredtiger-compression

Page Compression

The whole pages could be compressed. This gives 2x-4x reduciton in size on average. Two different approaches are used in practice - without in-memory compression, with in-memory compression

Without in-memory compression

Data is stored in-memory as is, in uncompressed form. When it is time to flush data to disk compression is applied. If data size is reduced significantly, data is stored in compressed form. Otherwise it is stored in plain form (compression faiure). Big block sizes (e.g. 32Kb) is typically used in this case to achieve higher compression rates. Data is still being written to disk in blocks of smaller sizes. E.g. one may have 32Kb block in-memory, which is compressed to 7Kb, which is then written as two 4Kb blocks to disk. Vendors allow to select compression algorithm (Snappy, zlib, lz4, etc.).

Page compression is not applicable to index pages because it incurs serious slowdow no reads.

Hole punching with fallocate [1] might be added if underlying file system supports it. In this case compressed block is written as is, but then empty space is trimmed with separate system call. E.g. if 32Kb block is compressed to 6.5Kb, then 32Kb is written as is, and then 32 - 7 = 25 Kb are released.

Advantages:

High compression rates
No overhead when reading data from memory
Ability to choose compression algorithm

Disadvantages:

High RAM usage
Need to re-compress data frequently (spe
Hole-punching is supported by very few file systems (XFS, ext4, Btrfs), and may lead to heavy file maintenance [2], [7]

Examples:

MySQL Table Compression - uses different in-memory and disk block sizes, block data is fully re-compressed on every access [3]
MySQL Page Compression - uses hole-punching instead [4]
MongoDB with Snappy codec - gathers up to 32Kb of data and then try to compress it [5]
Postgres Professional attempts to implement similar approach - data will be uncompressed when read from disk [6]

[1] http://man7.org/linux/man-pages/man2/fallocate.2.html
[2] https://mariadb.org/innodb-holepunch-compression-vs-the-filesystem-in-mariadb-10-1/
[3] https://dev.mysql.com/doc/refman/5.7/en/innodb-compression-background.html

...

[4] https://mysqlserverteam.com/innodb-transparent-page-compression/

3) COMPRESS/UNCOMPRESS functions

...

[5] https://www.

...

objectrocket.com/blog/

...

company/mongodb-3-0-wiredtiger-compression-and-performance/
[6] https://

...

habr.com

...

/company/postgrespro/blog/337180/
[7] https://www.percona.com/blog/2017/11/20/innodb-page-compression/

With in-memory compression

Data is stored in-memory in compressed form. Data is accomodated in blocks in raw form. When certain threshold is reached data is compressed and more records could be added to it. Original row structure may be maintained to certain extent, so that subsequent reads do not need to uncompress data.

Advantages:

Less IO reads as more data fits memory
Uncompression may be avoided on reads in some cases

Disadvantages:

Lower compression rates
More complex algorithms

Examples:

Oracle Advanced Compression [1], [2]
MS SQL Page compression [3]

[1] https://

...

blogs.

...

oracle.com/

...

http://techblog.constantcontact.com/devops/space-the-final-frontier-a-story-of-mysql-compression/

"The code changes required to get the old InnoDB compression to work properly were extensive and complex. Its tentacles are everywhere—I think that just about every module inside InnoDB has had modifications done to it in order to make it work properly when compression is in use. This complexity has its challenges, both in terms of maintainability and when it comes to improving the feature. We have been debating internally about what we should do about this over the long run. As much as we would like to redesign and rewrite the entire old compression code, it is impractical. Therefore we are only fixing issues in the old compression code reported by customers. We think there are better ways to solve the underlying performance and compression problems around B-Trees. For example by adding support for other types of indexes e.g. LSM tree and/or BW-Tree or some variation of the two. "

------------------------------

MariaDB

...

dbstorage/updates-in-row-compressed-tables
[2] https://blogs.oracle.com/dbstorage/advanced-row-compression-improvements-with-oracle-database-12c-release-2
[3] https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/page-compression-implementation?view=sql-server-2017

WAL Compression

Write-ahead log writes all changes to data to journal. Data is read back only during crash recovery. Data chunks being written to journal may be compressed. It reduces number of disk writes and saves WAL space increasing likelihood of delta-update in case of temporal node shutdown. Compression could be applied to specific records - the larger the record, the more savings are expected. Compression typically takes more time than decompression, so operation latency may increase. However less number of IO calls typically increases overall system throughput because disk resources are typically more deficient than CPU.

Examples:

PostgreSQL - compresses page snapshots in two steps - remove empty space in the middle of the page, then compress the result using PG own algorithm "pglz" [1] (page 13)
MariaDB (MySQL) - compresses individual events [2]
MongoDB - compresses events larger than 128 bytes [3]

[1] https://www.pgcon.org/2016/schedule/attachments/432_WAL-Reduction.pdf
[2] https://mariadb.com/kb/en/library/compressing-events-to-reduce-size-of-the-binary-log/

...

[3] https://

...

docs.mongodb.com/

...

manual/

...

core/

...

wiredtiger/#storage-wiredtiger-journal

Column Compression

It is possible to compress specific columns only. This can be done either manually using built-in functions, or transparently if column compression hint is defined in CREATE TABLE or ALTER TABLE commands.

Advantages:

Fine-grained control

Disadvantages:

User needs to know well nature of data
Hard to integrate compressed columns with indexes
Good compression rate is only possible for large columns (e.g. CLOBs)

Examples:

SQL Server COMPRESS function - manual compression with gzip [1]
MySQL COMPRESS/UNCOMPRESS functions - just manual compression/decompression [2]
Percona Compressed Columns - transparent column compression with ability to pre-define a dictionary to improve compression rate [3]
MeriaDB Independent Column Compression - transparent oclumn compression with zlib and fine-grained control (threshold, compression level, zlib compression strategy) [4]

[1] https://docs.microsoft.com/en-us/sql/t-sql/functions/compress-transact-sql?view=sql-server-2017
[2] https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_compress
[3] https://www.percona.com/doc/percona-server/LATEST/flexibility/compressed_columns.html
[4] https://mariadb.com/kb/en/library/storage-engine-independent-column-compression/

Columnar Store

Usually data is stored in row format, when all attributes are stored together. Alternative approach is to store data column-wise. In this case all values of a specific column for a set of rows are stored near each other. This allows to improve compression rates dramatically, up to 10x. This approach saves a lot of spaces and improves scan speed, especially in OLAP cases. However, it suffers from read amplification for row lookups and write amplification for updates. Hence, it is usually used for cold data.

Advantages:

Dramatic disk savings - up to 10x-15x on typical historical data sets
Improved scan speed

Disadvantages:

Very slow lookups
Very slow updates

Examples:

Oracle Exadata Hybrid Columnar Compression - oragnize rows into groups, then transform them to columnar format and applies compression [1]
SQL Server Columnsotre Indexes - very similar to Oracle's idea [2] [3] [4]

[1] http://www.oracle.com/technetwork/database/features/availability/311358-132337.pdf
[2] https://msdn

https://mariadb.org/significant-performance-boost-with-new-mariadb-page-compression-on-fusionio/

?? 3) Independent Column Compression (similar to Percona?)

4) COMPRESS function (similar as in MySQL?) https://mariadb.com/kb/en/library/compress/

?? 5) ColumnStore

// TODO

https://mariadb.com/kb/en/library/optimization-and-tuning-compression/

------------------------------

SQL Server

1) Row compression - metadata, varlen for numeric types, trimming for CHAR types https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/row-compression-implementation?view=sql-server-2017

1.1) NULL and 0 take no bytes!

2) Page compression https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/page-compression-implementation?view=sql-server-2017

...

.microsoft.com/en-us/

...

library/gg492088(v=sql.120).aspx
[3] https://docs.microsoft.com/en-us/sql/relational-databases/indexes/

...

columnstore-

...

indexes-

...

overview?view=sql-server-2017
[4] https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/

...

data-compression

...

?view=sql-server-

...

2017#using-columnstore-and-

...

columnstore-archive-compression

Row Compression with Dictionary

Usually data is stored in row format, and there's a lot of overlap between values in different rows. Flags, enum values, dates or strings can have same byte sequences repeating from row to row. It is possible to harvest a set of typical rows for a table, create an external dictionary based on them, and then reuse this dictionary when writing each next row. This only offers limited benefits for classical RDBMS since their row format is low-overhead and with fixed field offset lookups, which are defeated by compression. However, BinaryObjects used in Ignite are high-overhead, with field/type information repeating in every record, and offset lookups are not used. Row compression can provide high yield with low overhead. In theory it is possible to share dictionary between nodes, but having separate dictionaries look more practical.

Advantages:

Easy to implement - no architectural changes
Reasonably fast in both writing and reading
2.5x compression on mock data with naive prototype
More data per page - more data fits in RAM - less latency even if throughput is lower

Disadvantages:

Need to keep dictionary alongside the data
Occassionally need to update the dictionary as data evolves
Keep track of multiple dictionaries per node
Pages from different nodes use different dictionaries, might interfere with checkpointing

Examples:

IBM DB2 supports this approach on per-table basis [1]
A prototype of dictionary-based row compression for Apache Ignite [2]

[1] https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0052331.html

[2] https://github.com/apache/ignite/pull/4295

Misc Approaches

Compressing Large Values

Large values, such as LOBs and long varchars, cannot be stored in original data block. Some vendors compress these values and then split into pieces.

Examples:

PostgreSQL TOAST [1]
Oracle Advanced LOB Compression [2] [3]

[1] https://wiki.postgresql.org/wiki/TOAST
[2] https://docs.oracle.com/database/121/ADLOB/adlob_smart.htm#ADLOB45944
[3] https://docs.microfocus.com/itom/Network_Automation:10.50/Administer/DB_Compression/ConfiguringLOB_Oracle

Compression During Data Load

Oracle attempts to compress values during data load (Direct Path, CTAS, INSERT ... APPEND) [1]. Compression is applied on per-block basis using dictionary approach. Oracle may decide to skip compression if there are no benefits. Alternatively, it may reorder attrbutes in rows to get longer common prefixes and improve compression ratio.

[1] https://www.red-gate.com/simple-talk/sql/oracle/compression-oracle-basic-table-compression/

Proposed Changes

Some approaches adds more value than others. Some approaches are hard to implement, some are easy. For this reason compression should be implemented in phases, with the most efficient and simple techniques being developed first. Proposed plan:

Phase 1: Low Hanging Fruits

Index Prefix Compression - efficient and relatively easy to implement
WAL Compression - could increase throughput and easy to evaluate

Phase 2: The Battle

Page Compression - efficient, but implementation would be complex. with lots of changes to storage engine
or, Row Compression with Dictionary - no changes to storage engine, but adds management of dictionaries

Phase 3: Excellence

Data Format improvements - moderate value for the system, complex to implement, benefit may be cancelled out by actual compression
Column Compression - depends on new data format

Out of Scope

The following changes are not likely to be implemented in the nearest time due to their complexity and/or limited impact on general use cases:

Columnar store - require large changes in storage engine

Oracle

1) Basic compression - compression during bulk loads only (direct load, CREATE TABLE AS SELECT)

2) Advanced Row Compression (ex. OLTP Table Compression) - used for DML, keep data compressed in-memory; more CPU but less IO - reads gets gain anyway

3) Advanced LOB compression and deduplication - appears to be something similar to PG TOAST (?)

4) Index compression - just prefix compression

?? 5) Advanced Index Compression

?? 6) Hybrid columnar compression (query level, archive level) - tremendous compression rates (up to 50x), 5-15x typical, "query" - improves scan performance, "archive" - for data archives

7) Compression at tablespace and table levels

8) Tablespace encryption - after compression, column encryption - before compression, no effect

9) Indexes are compressed separately from data!

10) Clustered tables - only prefix compression is applicable

11) Heat Map - insight on how data is accessed

12) Advanced Row Compression - can read specific attributes without full decompression

https://www.oracle.com/us/assets/lad-2015-ses16380-pedregal-2604876.pdf

http://www.oracle.com/technetwork/database/options/compression/advanced-compression-wp-12c-1896128.pdf

https://blogs.oracle.com/dbstorage/advanced-row-compression-improvements-with-oracle-database-12c-release-2

!!! https://blogs.oracle.com/dbstorage/updates-in-row-compressed-tables

"With Advanced Row Compression, when the block is full, it is compressed. More rows are then added (since more rows can now fit into the block), and the process of recompression is repeated several times until the rows in the block cannot be compressed further. Blocks are usually compressed and reformatted in their entirety, but, starting with Oracle Database 12c Release 2, in some cases the block can be partially compressed, hence resulting in CPU savings and extra compression."

------------------------------

MongoDB

1) Block compression (Snappy, zlib) https://docs.mongodb.com/manual/core/wiredtiger/#compression

2) Prefix compression (indexes, once per page) https://docs.mongodb.com/manual/reference/glossary/#term-prefix-compression

3) WAL compression https://docs.mongodb.com/manual/core/wiredtiger/#storage-wiredtiger-journal

4) Configurable per-collection and per-index

https://www.mongodb.com/blog/post/new-compression-options-mongodb-30

https://serverfault.com/questions/826181/does-the-mongodb-3-2-wiredtiger-compression-include-stuff-stored-in-ram

http://ilearnasigoalong.blogspot.ru/2015/03/wired-tiger-how-to-reduce-your-mongdb.html

https://www.mongodb.com/presentations/a-technical-introduction-to-wiredtiger

https://www.objectrocket.com/blog/company/mongodb-3-0-wiredtiger-compression-and-performance/

"The cache generally stores uncompressed changes (the exception is for very large documents). The default snappy compression is fairly straightforward: it gathers data up to a maximum of 32KB, compresses it, and if compression is successful, writes the block rounded up to the nearest 4KB.

The alternative zlib compression works a little differently: it will gather more data and compress enough to fill a 32KB block on disk. This is more CPU intensive but generally results in better compression ratios (independent of the inherent differences between snappy and zlib)."

—Michael Cahill

Motivation

TBD

Proposed changes

TBD

...

Tickets

Jira

server	ASF JIRA
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
maximumIssues	20
jqlQuery	project = Ignite AND labels IN (iep-20) ORDER BY status
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b

Page tree

Versions Compared

Old Version 7

New Version Current

Key

Draft materials

Motivation

Competitive Analysis

Data Format

Index Prefix Compression

Page Compression

Without in-memory compression

With in-memory compression

WAL Compression

Column Compression

Columnar Store

Row Compression with Dictionary

Misc Approaches

Compressing Large Values

Compression During Data Load

Proposed Changes

Phase 1: Low Hanging Fruits

Phase 2: The Battle

Phase 3: Excellence

Out of Scope

Motivation

Proposed changes

Tickets

Page tree

Page History

Versions Compared

Old Version 7

New Version Current

Key

Draft materials

Motivation

Competitive Analysis

Data Format

Index Prefix Compression

Page Compression

Without in-memory compression

With in-memory compression

WAL Compression

Column Compression

Columnar Store

Row Compression with Dictionary

Misc Approaches

Compressing Large Values

Compression During Data Load

Proposed Changes

Phase 1: Low Hanging Fruits

Phase 2: The Battle

Phase 3: Excellence

Out of Scope

Motivation

Proposed changes

Tickets