...
Performance numbers of other vendors demonstrate that we could expect 2x-4x decrease in required disk space and >1.5x increase in throughput (ops/sec) on typical workloads.
...
This section describes general compression approaches and their pros and cons. The following compression mechanisms are implemented in practice:
Data size could be reduced with efficient data format. Common row metadata could be skipped. Numeric values could be encoded to occupy less space. Fixed-length strings could be trimmed. NULL and zero values could be skipped with help of small row bitmaps.
...
Efficient disk usage starts with proper data layout. Vendors strive to place data in pages in such a way that total overhead is kept as low as possible while still maintaining high read speed. Typically this is achieved as follows:
...
Examples:
...
Prefix compression could be applied to the following index types:
Prefix compression could be implemented as follows:
Examples:
...
[1] http://www.oracle.com/technetwork/database/features/availability/311358-132337.pdf
[2] https://msdn.microsoft.com/en-us/library/gg492088(v=sql.120).aspx
[3] https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-overview?view=sql-server-2017
[4] https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/data-compression?view=sql-server-2017#using-columnstore-and-columnstore-archive-compression
Usually data is stored in row format, and there's a lot of overlap between values in different rows. Flags, enum values, dates or strings can have same byte sequences repeating from row to row. It is possible to harvest a set of typical rows for a table, create an external dictionary based on them, and then reuse this dictionary when writing each next row. This only offers limited benefits for classical RDBMS since their row format is low-overhead and with fixed field offset lookups, which are defeated by compression. However, BinaryObjects used in Ignite are high-overhead, with field/type information repeating in every record, and offset lookups are not used. Row compression can provide high yield with low overhead. In theory it is possible to share dictionary between nodes, but having separate dictionaries look more practical.
Advantages:
Disadvantages:
Examples:
[2] https://github.com/apache/ignite/pull/4295
Large values, such as LOBs and long varchars, cannot be stored in original data block. Some vendors compress these values Large values, such as LOBs and long varchars, cannot be stored in original data block. Some vendors compress these values and then split into pieces.
...
[1] https://wiki.postgresql.org/wiki/TOAST
[2] https://docs.oracle.com/database/121/ADLOB/adlob_smart.htm#ADLOB45944
[3] https://docs.microfocus.com/itom/Network_Automation:10.50/Administer/DB_Compression/ConfiguringLOB_Oracle
...
Oracle attempts to compress values during data load (Direct Path, CTAS, INSERT ... APPEND) [1]. Compression is applied on per-block basis using dictionary approach. Oracle may decide to skip compression if there are no benefits. Alternatively, it may reorder attrbutes in rows to get longer common prefixes and improve compression ratio.
[1] https://www.red-gate.com/simple-talk/sql/oracle/compression-oracle-basic-table-compression/
TBD
...
CTAS, INSERT ... APPEND) [1]. Compression is applied on per-block basis using dictionary approach. Oracle may decide to skip compression if there are no benefits. Alternatively, it may reorder attrbutes in rows to get longer common prefixes and improve compression ratio.
[1] https://www.red-gate.com/simple-talk/sql/oracle/compression-oracle-basic-table-compression/
Some approaches adds more value than others. Some approaches are hard to implement, some are easy. For this reason compression should be implemented in phases, with the most efficient and simple techniques being developed first. Proposed plan:
The following changes are not likely to be implemented in the nearest time due to their complexity and/or limited impact on general use cases:
Jira | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|