Page History

...

Data Format
Index Prefix Compression
Page Compression
WAL compression
Column Compression
Columnar Store
Row Compression with Dictionary
Misc approaches

Data size could be reduced with efficient data format. Common row metadata could be skipped. Numeric values could be encoded to occupy less space. Fixed-length strings could be trimmed. NULL and zero values could be skipped with help of small row bitmaps.

...

[1] http://www.oracle.com/technetwork/database/features/availability/311358-132337.pdf
[2] https://msdn.microsoft.com/en-us/library/gg492088(v=sql.120).aspx
[3] https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-overview?view=sql-server-2017
[4] https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/data-compression?view=sql-server-2017#using-columnstore-and-columnstore-archive-compression

Misc Approaches

Compressing Large Values

Row Compression with Dictionary

Usually data is stored in row format, and there's a lot of overlap between values in different rows. Flags, enum values, dates or strings can have same byte sequences repeating from row to row. It is possible to harvest a set of typical rows for a table, create an external dictionary based on them, and then reuse this dictionary when writing each next row. This only offers limited benefits for classical RDBMS since their row format is low-overhead and with fixed field offset lookups, which are defeated by compression. However, BinaryObjects used in Ignite are high-overhead, with field/type information repeating in every record, and offset lookups are not used. Row compression can provide high yield with low overhead. In theory it is possible to share dictionary between nodes, but having separate dictionaries look more practical.

Advantages:

Easy to implement - no architectural changes
Reasonably fast in both writing and reading
2.5x compression on mock data with naive prototype
More data per page - more data fits in RAM - less latency even if throughput is lower

Disadvantages:

Need to keep dictionary alongside the data
Occassionally need to update the dictionary as data evolves
Keep track of multiple dictionaries per node
Pages from different nodes use different dictionaries, might interfere with checkpointing

Examples:

IBM DB2 supports this approach on per-table basis [1]
A prototype of dictionary-based row compression for Apache Ignite [2]

[1] https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0052331.html

[2] https://github.com/apache/ignite/pull/4295

Misc Approaches

Compressing Large Values

Large values, such as LOBs and long varchars, cannot be stored in original data block. Some vendors compress these values and then Large values, such as LOBs and long varchars, cannot be stored in original data block. Some vendors compress these values and then split into pieces.

Examples:

...

Page Compression - efficient, but implementation would be complex. with lots of changes to storage engine
or, Row Compression with Dictionary - no changes to storage engine, but adds management of dictionaries

Phase 3: Excellence

Data Format improvements - moderate value for the system, complex to implement, benefit may be cancelled out by actual compression
Column Compression - depends on new data format

...

Page tree

Versions Compared

Old Version 35

New Version Current

Key

Misc Approaches

Compressing Large Values

Row Compression with Dictionary

Misc Approaches

Compressing Large Values

Phase 3: Excellence