...
ID | IEP-20 | ||||||||
Author | Vladimir Ozerov Ozerov | ||||||||
Sponsor | Vladimir Ozerov Ozerov | ||||||||
Created | 25 Apr 2018 | ||||||||
Status |
|
Table of Contents |
---|
...
Compression is used extensively by all database vendors to reduce TCO and improve performance. Compression could be applied to different parts of the system to achieve the following goals:
Performance numbers of other vendors demonstrate that we could expect 2x-4x decrease in required disk space and >1.5x increase in throughput (ops/sec) on typical workloads.
This section describes general compression approaches and their pros and cons. The following compression mechanisms are implemented in practice:
Data size could be reduced with efficient data format. Common row metadata could be skipped. Numeric values could be encoded to occupy less space. Fixed-length strings could be trimmed. NULL and zero values could be skipped with help of small row bitmaps.
Next compression could be applied to specific columns, either manually or transparently.
Data pages could be compressed on per-page basis, what gives 2x-4x compression rate on typical workloads. Depending on concrete implementation pages could be stored in memory in compressed or uncompressed forms. File system specific features, such as hole punching, could be applied to reduce IO.
Indexes are compressed differently because it is necessary to keep index page structure for fast lookups. Prefix compression is a common mechanism.
Extreme compression rates up to 10x-15x are possible with column store formats. But it is only applicable to historical data with minimal update rate and is not suitable for lookups.
WAL could be compressed to decrease log size. In some cases it may improve overall system throughput.
Last, compression could be applied to special cases. Examples are compression of large values (LOBs, long strings) and per-page dictionary compression during data load.
Efficient disk usage starts with proper data layout. Vendors strive to place data in pages in such a way that total overhead is kept as low as possible while still maintaining high read speed. Typically this is achieved as follows:
Examples:
[1] https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/row-compression-implementation?view=sql-server-2017
[2] https://dev.mysql.com/doc/refman/5.6/en/innodb-physical-record.html
Secondary indexes tend to have entries with common prefix. E.g. {'IndexedValue', link1}, {'indexedValue', link2}. Prefix compression technique extracts common prefixes from entries on the index page and place them in a special directory within a page.
Prefix compression could be applied to the following index types:
Prefix compression could be implemented as follows:
Examples:
[1] https://blogs.oracle.com/dbstorage/compressing-your-indexes:-index-key-compression-part-1
[2] https://blogs.oracle.com/dbstorage/compressing-your-indexes:-advanced-index-compression-part-2
[3] https://docs.mongodb.com/manual/core/wiredtiger/#storage-wiredtiger-compression
The whole pages could be compressed. This gives 2x-4x reduciton in size on average. Two different approaches are used in practice - without in-memory compression, with in-memory compression
Data is stored in-memory as is, in uncompressed form. When it is time to flush data to disk compression is applied. If data size is reduced significantly, data is stored in compressed form. Otherwise it is stored in plain form (compression faiure). Big block sizes (e.g. 32Kb) is typically used in this case to achieve higher compression rates. Data is still being written to disk in blocks of smaller sizes. E.g. one may have 32Kb block in-memory, which is compressed to 7Kb, which is then written as two 4Kb blocks to disk. Vendors allow to select compression algorithm (Snappy, zlib, lz4, etc.).
Page compression is not applicable to index pages because it incurs serious slowdow no reads.
Hole punching with fallocate [1] might be added if underlying file system supports it. In this case compressed block is written as is, but then empty space is trimmed with separate system call. E.g. if 32Kb block is compressed to 6.5Kb, then 32Kb is written as is, and then 32 - 7 = 25 Kb are released.
Advantages:
Disadvantages:
Examples:
[1] http://man7.org/linux/man-pages/man2/fallocate.2.html
[2] https://mariadb.org/innodb-holepunch-compression-vs-the-filesystem-in-mariadb-10-1/
[3] https://dev.mysql.com/doc/refman/5.7/en/innodb-compression-background.html
...
[4] https://mysqlserverteam.com/innodb-transparent-page-compression/
...
[5] https://www.
...
...
company/mongodb-3-0-wiredtiger-compression-and-performance/
[6] https://
...
...
/company/postgrespro/blog/337180/
[7] https://www.percona.com/blog/2017/11/20/innodb-page-compression/
Data is stored in-memory in compressed form. Data is accomodated in blocks in raw form. When certain threshold is reached data is compressed and more records could be added to it. Original row structure may be maintained to certain extent, so that subsequent reads do not need to uncompress data.
Advantages:
Disadvantages:
Examples:
[1] https://
...
...
...
...
dbstorage/updates-in-row-compressed-tables
[2] https://blogs.oracle.com/dbstorage/advanced-row-compression-improvements-with-oracle-database-12c-release-2
[3] https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/page-compression-implementation?view=sql-server-2017
Write-ahead log writes all changes to data to journal. Data is read back only during crash recovery. Data chunks being written to journal may be compressed. It reduces number of disk writes and saves WAL space increasing likelihood of delta-update in case of temporal node shutdown. Compression could be applied to specific records - the larger the record, the more savings are expected. Compression typically takes more time than decompression, so operation latency may increase. However less number of IO calls typically increases overall system throughput because disk resources are typically more deficient than CPU.
Examples:
[1] https://www.pgcon.org/2016/schedule/attachments/432_WAL-Reduction.pdf
[2] https://mariadb.com/kb/en/library/compressing-events-to-reduce-size-of-the-binary-log/
...
[3] https://
...
...
...
...
wiredtiger/#storage-wiredtiger-journal
It is possible to compress specific columns only. This can be done either manually using built-in functions, or transparently if column compression hint is defined in CREATE TABLE or ALTER TABLE commands.
Advantages:
Disadvantages:
Examples:
[1] https://docs.microsoft.com/en-us/sql/t-sql/functions/compress-transact-sql?view=sql-server-2017
[2] https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_compress
[3] https://www.percona.com/doc/percona-server/LATEST/flexibility/compressed_columns.html
[4] https://mariadb.com/kb/en/library/storage-engine-independent-column-compression/
Usually data is stored in row format, when all attributes are stored together. Alternative approach is to store data column-wise. In this case all values of a specific column for a set of rows are stored near each other. This allows to improve compression rates dramatically, up to 10x. This approach saves a lot of spaces and improves scan speed, especially in OLAP cases. However, it suffers from read amplification for row lookups and write amplification for updates. Hence, it is usually used for cold data.
Advantages:
Disadvantages:
Examples:
[1] http://www.oracle.com/technetwork/database/features/availability/311358-132337.pdf
[2] https://msdn
...
...
library/gg492088(v=sql.120).aspx
[3] https://docs.microsoft.com/en-us/sql/relational-databases/indexes/
...
...
...
overview?view=sql-server-2017
[4] https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/
...
...
...
...
columnstore-archive-compression
Usually data is stored in row format, and there's a lot of overlap between values in different rows. Flags, enum values, dates or strings can have same byte sequences repeating from row to row. It is possible to harvest a set of typical rows for a table, create an external dictionary based on them, and then reuse this dictionary when writing each next row. This only offers limited benefits for classical RDBMS since their row format is low-overhead and with fixed field offset lookups, which are defeated by compression. However, BinaryObjects used in Ignite are high-overhead, with field/type information repeating in every record, and offset lookups are not used. Row compression can provide high yield with low overhead. In theory it is possible to share dictionary between nodes, but having separate dictionaries look more practical.
Advantages:
Disadvantages:
Examples:
[2] https://github.com/apache/ignite/pull/4295
Large values, such as LOBs and long varchars, cannot be stored in original data block. Some vendors compress these values and then split into pieces.
Examples:
[1] https://wiki.postgresql.org/wiki/TOAST
[2] https://docs.oracle.com/database/121/ADLOB/adlob_smart.htm#ADLOB45944
[3] https://docs.microfocus.com/itom/Network_Automation:10.50/Administer/DB_Compression/ConfiguringLOB_Oracle
Oracle attempts to compress values during data load (Direct Path, CTAS, INSERT ... APPEND) [1]. Compression is applied on per-block basis using dictionary approach. Oracle may decide to skip compression if there are no benefits. Alternatively, it may reorder attrbutes in rows to get longer common prefixes and improve compression ratio.
[1] https://www.red-gate.com/simple-talk/sql/oracle/compression-oracle-basic-table-compression/
Some approaches adds more value than others. Some approaches are hard to implement, some are easy. For this reason compression should be implemented in phases, with the most efficient and simple techniques being developed first. Proposed plan:
The following changes are not likely to be implemented in the nearest time due to their complexity and/or limited impact on general use cases:
TBD
TBD
...
Jira | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|