THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- 本次实验使用的人工数据集不具有代表性:其 编码压缩效率过低。甚至对于值列来说,和其它编码方式相比,采用PLAIN编码的空间大小是最小的。
- 在中车ZT11529数据集上的实验结果来看:
- 真实数据集的压缩率高、磁盘数据量相对小,此时【从磁盘加载Chunk数据的耗时】小于【解压缩和解码Page数据的耗时】
- D-1步骤内部的耗时瓶颈就是子步骤7_2_data_decompress_PageDataByteArray。注意:人工数据实验里发现另一个子步骤7_1_data_ByteBuffer_to_ByteArray(us)的占比也高,分析是因为人工数据压缩率很低,子步骤7_2_data_decompress_PageDataByteArray(us)耗时相对少,从而7_1_data_ByteBuffer_to_ByteArray(us)耗时占比相对偏高。
- D-1解压缩和D-2解码的空间压缩关系和耗时关系还有待探索
- 写数据的耗时可以也测量一下
- 注意RLE编码对于浮点数是有损的
实验设置
IoTDB版本
- v0.13.1
实验环境
- CPU:Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz,(6核12线程)
- L1 cache 284KB, L2 cache 1536KB, L3 cache 12MB
- 内存:16G
- 硬盘:1.8T HDD /dev/sdb1 mounted on /disk
- 操作系统:Ubuntu 16.04.7 LTS
...
压缩方式 | GZIP | LZ4 | SNAPPY | UNCOMPRESSED |
dataset | /disk/rl/zc_data/ZT11529.csv | /disk/rl/zc_data/ZT11529.csv | /disk/rl/zc_data/ZT11529.csv | /disk/rl/zc_data/ZT11529.csv |
pagePointNum(ppn) | 10000 | 10000 | 10000 | 10000 |
numOfPagesInChunk(pic) | 100 | 100 | 100 | 100 |
chunksWritten(cw) | 13 | 13 | 13 | 13 |
timeEncoding(te) | PLAIN | PLAIN | PLAIN | PLAIN |
valueDataType(vt) | DOUBLE | DOUBLE | DOUBLE | DOUBLE |
valueEncoding(ve) | PLAIN | PLAIN | PLAIN | PLAIN |
compression(co) | GZIP | LZ4 | SNAPPY | UNCOMPRESSED |
totalPointNum | 12780287 | 12780287 | 12780287 | 12780287 |
tsfileSize(MB) | 50.7578783 | 74.71146107 | 75.81546879 | 195.0946026 |
chunkDataSize_stats_mean(MB) | 3.977135579 | 5.846096277 | 5.932730039 | 15.26517868 |
compressedPageSize_stats_mean(B) | 41639.28917 | 61236.7625 | 62145.18333 | 160003 |
uncompressedPageSize_stats_mean(B) | 160003 | 160003 | 160003 | 160003 |
timeBufferSize_stats_mean(B) | 80000 | 80000 | 80000 | 80000 |
valueBufferSize_stats_mean(B) | 80000 | 80000 | 80000 | 80000 |
total_time(us) | 2096685.5981 | 1244193.0404 | 1195582.1173 | 1895095.384 |
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData | ||||
[Avg&Per] (A)get_chunkMetadatas | 86791.3859 us - 4.1844839300349514% | 100869.9527 us - 8.731470875680406% | 99547.05960000001 us - 7.911992556094877% | 88166.6213 us - 4.659966496653105% |
[Avg&Per] (B)load_on_disk_chunk | 349328.55549999996 us - 16.84222128306944% | 452828.32 us - 39.197572537012675% | 450015.7795 us - 35.76721916082827% | 1155773.8993 us - 61.08737716190697% |
[Avg&Per] (C)get_pageHeader | 2818.293900000001 us - 0.13587875585087916% | 3913.898699999998 us - 0.3387935811871694% | 3502.114700000002 us - 0.27834780402685505% | 4450.306 us - 0.23521687180559223% |
[Avg&Per] (D_1)decompress_pageData | 1350175.8349000001 us - 65.09619618668371% | 261417.90119999985 us - 22.628768326063632% | 395144.6617 us - 31.40606698493612% | 173785.97769999996 us - 9.185299626198828% |
[Avg&Per] (D_2)decode_pageData | 285009.94010000007 us - 13.74121984436101% | 336215.7518000001 us - 29.103394680056127% | 309969.7735 us - 24.636373494113883% | 469824.3798999997 us - 24.832139843435506% |
SUM | 2074124.0103 | 1155245.8243999998 | 1258179.389 | 1892001.1841999996 |
[3] D_1 compare each step inside | ||||
[Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us) | 4312.080999999998 us - 0.3365746911741179% | 10247.3701 us - 5.205625954425964% | 9355.254500000003 us - 2.648386471251286% | 33088.566800000015 us - 56.71920599957147% |
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us) | 1274619.9271000002 us - 99.48904214184739% | 183796.93149999995 us - 93.36815862249874% | 341381.81159999996 us - 96.64204980983628% | 21318.842899999996 us - 36.54397754446109% |
[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us) | 454.68920000000026 us - 0.03549026028736633% | 659.1047999999998 us - 0.334822790636471% | 604.4583000000001 us - 0.17111658310904862% | 973.5250000000001 us - 1.6687808013713306% |
[Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us) | 1779.4489000000008 us - 0.1388929066911369% | 2148.4263999999994 us - 1.091392632438828% | 1902.0298000000007 us - 0.5384471358033915% | 2956.5653000000016 us - 5.068035654596102% |
[3] D_2 compare each step inside | ||||
[Avg&Per] (D-2)8_1_createBatchData(us) | 3343.3923 us - 0.259976288059772% | 3522.578 us - 0.27212097540020236% | 3458.9225 us - 0.2672966393395521% | 3730.5984 us - 0.2816327187407449% |
[Avg&Per] (D-2)8_2_timeDecoder_hasNext(us) | 232202.511 us - 18.05565768873081% | 231677.2947 us - 17.897191037883086% | 231911.8133 us - 17.921548782382853% | 237390.8697 us - 17.921263258420133% |
[Avg&Per] (D-2)8_3_timeDecoder_readLong(us) | 254086.67 us - 19.75733129255225% | 255389.3804 us - 19.72896194244707% | 255976.5501 us - 19.78120978179259% | 261059.1494 us - 19.70804415658043% |
[Avg&Per] (D-2)8_4_valueDecoder_read(us) | 241634.0221 us - 18.78903535624908% | 242640.7293 us - 18.744122040429612% | 242328.9765 us - 18.726560376256852% | 246610.5964 us - 18.6172847590372% |
[Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us) | 230100.908 us - 17.892240746329144% | 231053.5736 us - 17.849008259784295% | 231043.8535 us - 17.85447508020484% | 235200.0462 us - 17.755872210542634% |
[Avg&Per] (D-2)8_6_putIntoBatchData(us) | 324669.8983 us - 25.24575862807894% | 330206.1447 us - 25.508595744055732% | 329318.78729999997 us - 25.448909340023306% | 340641.1962 us - 25.715902896678855% |
- 可以看到,当时间戳列和值列都使用PLAIN编码之后,压缩负责了全部的压缩率,此时D-1操作和耗时占比有了明显提高;但是也可以看到,即便如此,除了GZIP之外的压缩方式的D-1耗时占比也没有增大到60%以上,D-2解码操作仍然有不小的基础耗时
改变值列编码方式
RLValueEncodingRealExpScripts.sh
...