https://github.com/apache/iotdb/pull/1890
This experiment show the performance on compress ratio and time cost of DIFF encoding. The experiment data is generated by the same test classes with TS_2DIFF encoding algorithm (in tsfile's test package), including quadratic function data (INT and LONG), regular date (LONG) and regular date with missing points(LONG).
Compression ratio
Size before encoding / byte | Size After encoding / byte | compression ratio | |
quadratic function (INT) | 40,000,000 | 29,857,238 | 1.340 |
quadratic function (LONG) | 80,000,000 | 69,857,238 | 1.145 |
regular date (LONG) | 100,000 | 43,413 | 2.303 |
regular date with missing points(LONG) | 100,000 | 43,811 | 2.283 |
The DIFF encoding is like TS_2DIFF, based on delta encoding algorithm. It will encode values by the difference between the current value and the previous value. So for data sets, the more regualr data with smaller difference can have a better perfomance on compression ratio.
Time cost
quadratic function(INT) / ns | quadratic function(LONG ) / ns | regular date(LONG) / ns | regutlar date with missing points(LONG) / ns | |
Encode | 1,909,067,200 | 3,043,108,920 | 25,280,460 | 26,435,600 |
Decode | 1,433,093,540 | 3,072,575,720 | 12,295,180 | 12,961,360 |