Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. 本次实验使用的人工数据集不具有代表性:其时间戳从1开始以步长1递增,值在[0,100)随机取整数,导致时间戳的编码压缩效率过高、值列的编码压缩效率过低。甚至对于值列来说,和其它编码方式相比,采用PLAIN编码的空间大小是最小的。
  2. 在中车ZT11529数据集上的实验结果来看:
    • 真实数据集的压缩率高、磁盘数据量相对小,此时【从磁盘加载Chunk数据的耗时】小于【解压缩和解码Page数据的耗时】,即整体耗时瓶颈不是磁盘IO。
    • D-1步骤内部的耗时瓶颈就是子步骤7_2_data_decompress_PageDataByteArray。注意:人工数据实验里发现另一个子步骤7_1_data_ByteBuffer_to_ByteArray(us)的占比也高,分析是因为人工数据压缩率很低,子步骤7_2_data_decompress_PageDataByteArray(us)耗时相对少,从而7_1_data_ByteBuffer_to_ByteArray(us)耗时占比相对偏高。
    • D-2类操作内部没有某一个子步骤是突出的耗时瓶颈。
    • 相对其它压缩方法,GZIP的压缩率最高,但磁盘加载IO代价和解压缩代价之间有tradeoff,GZIP压缩下的整体读耗时并不是最小的。
  3. 后续
    1. 可以增大真实数据集的数据量之后再实验看看,目前使用的中车数据量级是一千万点。
    2. D-1解压缩和D-2解码的空间压缩关系和耗时关系还有待探索
    3. 写数据的耗时可以也测量一下
    4. 注意RLE编码对于浮点数是有损的
    后续可以增大真实数据集的数据量之后再实验看看,目前使用的中车数据量级是一千万点。

实验设置

IoTDB版本

  • v0.13.1

实验环境

  • FIT楼166.111.130.101 / 192.168.130.31
  • CPU:Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz,(6核12线程)
  • L1 cache 284KB, L2 cache 1536KB, L3 cache 12MB
  • 内存:16G
  • 硬盘:1.8T HDD /dev/sdb1 mounted on /disk
  • 操作系统:Ubuntu 16.04.7 LTS
  • 工作文件夹:/disk/rl/tsfileReadExp/

...

压缩方式GZIPLZ4SNAPPYUNCOMPRESSED
dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv
pagePointNum(ppn)10000100001000010000
numOfPagesInChunk(pic)100100100100
chunksWritten(cw)13131313
timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
valueDataType(vt)DOUBLEDOUBLEDOUBLEDOUBLE
valueEncoding(ve)GORILLAGORILLAGORILLAGORILLA
compression(co)GZIPLZ4SNAPPYUNCOMPRESSED
totalPointNum12780287127802871278028712780287
tsfileSize(MB)19.3486261423.7774105123.1564121236.30773735
chunkDataSize_stats_mean(MB)1.5151396591.860070071.8131843412.837062438
compressedPageSize_stats_mean(B)15824.1666719440.27518948.6929684.7575
uncompressedPageSize_stats_mean(B)29684.757529684.757529684.757529684.7575
timeBufferSize_stats_mean(B)11461.462511461.462511461.462511461.4625
valueBufferSize_stats_mean(B)18221.2683318221.2683318221.2683318221.26833
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
[Avg&Per] (A)get_chunkMetadatas87316.85010000001 us - 8.19883087518919%100289.6416 us - 12.59103802007131%89466.6045 us - 11.199530364576052%88760.777 us - 10.335699109318087%
[Avg&Per] (B)load_on_disk_chunk160699.6025 us - 15.089285299443361%176784.88239999997 us - 22.19476647997349%105802.9384 us - 13.244531050378352%191518.85280000002 us - 22.301305860612082%
[Avg&Per] (C)get_pageHeader2436.5129999999995 us - 0.22878239411203669%2198.3668000000007 us - 0.27599779517870476%2319.9517000000005 us - 0.29041416798711567%3134.228800000001 us - 0.3649635223062446%
[Avg&Per] (D_1)decompress_pageData356587.5454999999 us - 33.482666568996265%31158.160799999983 us - 3.911805656191469%115629.7179 us - 14.474658381255693%29640.983400000016 us - 3.4515277590088287%
[Avg&Per] (D_2)decode_pageData457950.96670000016 us - 43.000434862259155%486085.0215000002 us - 61.02639204858503%485623.25309999986 us - 60.790866035802786%545723.8052999998 us - 63.54650374875476%
SUM1064991.4778 us796516.0731000002 us798842.4655999999 us858778.6472999998 us
[3] D_1 compare each step inside
[Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)1269.7615999999996 us - 0.36377892330196115%1808.5776999999991 us - 7.207505165727991%1687.9916000000003 us - 1.412377327892232%3197.6313 us - 41.86667970732365%
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)345856.23130000004 us - 99.08569249502276%21247.39440000001 us - 84.67466169480036%116100.69619999993 us - 97.14384305311928%2432.4552000000012 us - 31.84820675254647%
[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)374.18470000000025 us - 0.10720162531460037%442.10720000000003 us - 1.761876157051776%424.26030000000026 us - 0.35498732863644405%421.1930000000002 us - 5.51469221169019%
[Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)1547.4220999999993 us - 0.443326956360674%1594.8989000000004 us - 6.355956982419887%1301.2614999999998 us - 1.0887922903520593%1586.372500000001 us - 20.770421328439692%
[3] D_2 compare each step inside
[Avg&Per] (D-2)8_1_createBatchData(us)3430.8672 us - 0.22855030377902266%4174.8958 us - 0.275155541260775%3438.1465 us - 0.22874736939899784%3730.6205 us - 0.24721094086707432%
[Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)234016.77980000002 us - 15.589238228946503%236135.9462 us - 15.56305048087338%235278.4135 us - 15.65358490817499%234469.6457 us - 15.537217392727715%
[Avg&Per] (D-2)8_3_timeDecoder_readLong(us)357893.9434 us - 23.841426880277478%360253.1425 us - 23.743262865502565%358341.1524 us - 23.84121675993312%363063.0738 us - 24.058508247673558%
[Avg&Per] (D-2)8_4_valueDecoder_read(us)353821.6809 us - 23.570149451806063%359477.8899 us - 23.692168165422427%356440.4841 us - 23.714761161335133%355773.1939 us - 23.5754416723178%
[Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)223758.3096 us - 14.905861011513531%224938.2721 us - 14.825043539994219%224282.1638 us - 14.921980483485841%226053.6587 us - 14.979528915812129%
[Avg&Per] (D-2)8_6_putIntoBatchData(us)328221.5562 us - 21.864774123677403%332305.597 us - 21.90131940694663%325251.7878 us - 21.639709317671908%325993.7043 us - 21.602092830601723%


补充实验

...

时间戳和值都使用PLAIN编码,然后改变压缩。

RLValueEncodingRealExpScriptsRLCompressionRealExpScripts.sh
7915373956999998 us7983817858000002 us8897145634 us7401884276000002
编码方式压缩方式GORILLAGZIPPLAINLZ4RLESNAPPYTS_2DIFFUNCOMPRESSED
dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv
pagePointNum(ppn)10000100001000010000
numOfPagesInChunk(pic)100100100100
chunksWritten(cw)13131313
timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFPLAINPLAINPLAINPLAINTS_2DIFF
valueDataType(vt)DOUBLEDOUBLEDOUBLEDOUBLE
valueEncoding(ve)GORILLAPLAINPLAINRLEPLAINTS_2DIFFPLAIN
compression(co)SNAPPYGZIPSNAPPYLZ4SNAPPYSNAPPYUNCOMPRESSED
totalPointNum12780287127802871278028712780287
tsfileSize(MB)2350.1564121275787832674.59960175711461072275.101478588154687920195.985507970946026
chunkDataSize_stats_mean(MB)13.81318434197713557925.08116674484609627715.740072568932730039115.64266101526517868
compressedPageSize_stats_mean(B)1894841639.69289172175861236.6166776251818262145.299171833317160.8925160003
uncompressedPageSize_stats_mean(B)29684.757591463.4891725245.9341716000316000316000316000322210.36417
timeBufferSize_stats_mean(B)11461.462511461.462511461.46258000080000800008000011461.4625
valueBufferSize_stats_mean(B)18221.26833800008000013782.44510746.8758000080000
total_time(us)2096685.59811244193.04041195582.11731895095.384
[2] category: (A)[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
[Avg&Per] (A)get_chunkMetadatas8875386791.03020000001 3859 us - 114.21274000219672%1844839300349514%97256100869.88870000001 9527 us - 128.181751942467722%731470875680406%8450999547.5108 05960000001 us - 97.498496964807579%911992556094877%8637088166.7961 6213 us - 114.6687579647861%659966496653105%
[Avg&Per] (B)load_on_disk_chunk105215349328.9241 55549999996 us - 1316.29260306229143%84222128306944%138228452828.6108 32 us - 1739.313597737139155%197572537012675%139816450015.5536 7795 us - 1535.714765088895252%76721916082827%968261155773.5282 8993 us - 1361.081335047881257%08737716190697%
[Avg&Per] (C)get_pageHeader23522818.1583 293900000001 us - 0.2971632563133493%13587875585087916%26253913.5312999999983 898699999998 us - 0.328856613051254%3387935811871694%33883502.6254999999987 114700000002 us - 0.38086658793698225%27834780402685505%31594450.255499999999 306 us - 0.4268177375109232%23521687180559223%
[Avg&Per] (D_1)decompress_pageData1144571350175.26139999992 8349000001 us - 1465.460120522641777%09619618668371%182862261417.35910000015 90119999985 us - 22.904124612107367%628768326063632%155021395144.76800000004 6617 us - 1731.42376424722015%40606698493612%120684173785.50350000002 97769999996 us - 169.304565026949902%185299626198828%
[Avg&Per] (D_2)decode_pageData480759285009.0216999998 94010000007 us - 6013.73737315655672%74121984436101%377408336215.39589999994 7518000001 us - 4729.27166909523449%103394680056127%506978309969.10549999995 7735 us - 5624.982107111140046%636373494113883%433147469824.3443000001 3798999997 us - 5824.51852422287182%832139843435506%
SUM2074124.01031155245.82439999981258179.3891892001.1841999996
[3] D_1 compare each step inside
[Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)17544312.3673999999999 080999999998 us - 10.6860113426598728%3365746911741179%232010247.3632000000002 3701 us - 15.489795195953362%205625954425964%18259355.1960999999988 254500000003 us - 12.548613686329812%648386471251286%160733088.3271999999997 566800000015 us - 156.758246074195737%71920599957147%
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)1005271274619.56879999992 9271000002 us - 9699.61067644486589%48904214184739%151532183796.73299999998 93149999995 us - 9793.29198327791244%36815862249874%113686341381.73430000001 81159999996 us - 96.4591326329927%64204980983628%8721221318.82 842899999996 us - 9536.4016073295714%54397754446109%
[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)431454.9958999999999 68920000000026 us - 0.41516388607230165%03549026028736633%468659.2119000000003 1047999999998 us - 0.30061666178303303%334822790636471%474604.28859999999975 4583000000001 us - 0.4024169332984032%17111658310904862%663973.3730000000002 5250000000001 us - 01.7256599483773118%6687808013713306%
[Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)13401779.373799999999 4489000000008 us - 10.288148326401935%1388929066911369%14292148.1740000000002 4263999999994 us - 01.9176048643511714%091392632438828%18731902.7816000000007 0298000000007 us - 10.5898367473790769%5384471358033915%19322956.989900000001 5653000000016 us - 25.1144866478555286%068035654596102%
[3] D_2 compare each step inside
[Avg&Per] (D-2)8_1_createBatchData(us)33483343.9259 3923 us - 0.2218271110091239%259976288059772%34423522.3069 578 us - 0.24318767981262365%27212097540020236%34663458.6887 9225 us - 0.22277034162258696%2672966393395521%33383730.6453 5984 us - 0.22194982158403212%2816327187407449%
[Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)235317232202.741 511 us - 1518.587043790733997%05565768873081%244839231677.8323 2947 us - 17.29713023052909%897191037883086%239308231911.8755 8133 us - 1517.37805224577913%921548782382853%240223237390.005 8697 us - 1517.969786637750962%921263258420133%
[Avg&Per] (D-2)8_3_timeDecoder_readLong(us)361562254086.5392 67 us - 2319.949282819264262%75733129255225%363364255389.4154 3804 us - 2519.67050285597614%72896194244707%362143255976.5815 5501 us - 2319.271443255687238%78120978179259%357833261059.1113 1494 us - 2319.788389623148674%70804415658043%
[Avg&Per] (D-2)8_4_valueDecoder_read(us)356325241634.4946 0221 us - 2318.602389962111484%78903535624908%241625242640.5526 7293 us - 1718.07005192367858%744122040429612%396438242328.5566 9765 us - 2518.475247513902037%726560376256852%335421246610.7093 5964 us - 2218.298501890735768%6172847590372%
[Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)225554230100.2633 908 us - 1417.940327764084076%892240746329144%232364231053.86419999998 5736 us - 1617.415814695306036%849008259784295%224510231043.63450000001 8535 us - 1417.4271133273284%85447508020484%230332235200.4743 0462 us - 1517.312273986066641%755872210542634%
[Avg&Per] (D-2)8_6_putIntoBatchData(us)327591324669.9399 8983 us - 2125.699128552797042%24575862807894%329856330206.9807 1447 us - 2325.30331261469754%508595744055732%330303329318.223 78729999997 us - 2125.225373315680617%448909340023306%337085340641.3345 1962 us - 2225.409098040713936%

补充实验

时间戳和值都使用PLAIN编码,然后改变压缩。

715902896678855%


  • 可以看到,当时间戳列和值列都使用PLAIN编码之后,压缩负责了全部的压缩率,此时D-1操作和耗时占比有了明显提高;但是也可以看到,即便如此,除了GZIP之外的压缩方式的D-1耗时占比也没有增大到60%以上,D-2解码操作仍然有不小的基础耗时

改变值列编码方式

RLValueEncodingRealExpScriptsRLCompressionRealExpScripts.sh
压缩方式
编码方式
GZIP
GORILLA
LZ4
PLAIN
SNAPPYdatasetpagePointNum(ppn)
RLE
UNCOMPRESSED
TS_2DIFF
dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv
pagePointNum(ppn)10000100001000010000
valueEncoding(ve)
numOfPagesInChunk(pic)100100100100
chunksWritten(cw)13131313
timeEncoding(te)
valueDataType(vt)
TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
valueDataType(vt)DOUBLEDOUBLEDOUBLEDOUBLE
valueEncoding(ve)GORILLAPLAINRLETS_2DIFF
compression(co)SNAPPYSNAPPYSNAPPYSNAPPY
totalPointNum12780287127802871278028712780287
tsfileSize(MB)23.1564121226.5996017522.1014785820.98550797
chunkDataSize_stats_mean(MB)1.8131843412.0811667441.7400725681.642661015
compressedPageSize_stats_mean(B)
uncompressedPageSize_
18948.6921758.6166718182.2991717160.8925
uncompressedPageSize_stats_mean(B)29684.757591463.4891725245.9341722210.36417
timeBufferSize_stats_mean(B)11461.462511461.462511461.462511461.4625
valueBufferSize_stats_mean(B)18221.268338000013782.44510746.875
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
[Avg&Per] (A)get_chunkMetadatas
[Avg&Per] (B)load_on_disk_chunk
88753.03020000001 us - 11.21274000219672%97256.88870000001 us - 12.181751942467722%84509.5108 us - 9.498496964807579%86370.7961 us - 11.6687579647861%
[Avg&Per] (
C)get_pageHeader
B)load_on_disk_chunk105215.9241 us - 13.29260306229143%138228.6108 us - 17.313597737139155%139816.5536 us - 15.714765088895252%96826.5282 us - 13.081335047881257%
[Avg&Per] (
D_1
C)
decompress
get_
pageData[Avg&Per] (D_2)decode_pageDataSUM
pageHeader2352.1583 us - 0.2971632563133493%2625.5312999999983 us - 0.328856613051254%3388.6254999999987 us - 0.38086658793698225%3159.255499999999 us - 0.4268177375109232%
[Avg&Per] (D_1)decompress_pageData114457.26139999992 us - 14.460120522641777%182862.35910000015 us - 22.904124612107367%155021.76800000004 us - 17.42376424722015%120684.50350000002 us - 16.304565026949902%
[Avg&Per] (D_2)decode_pageData480759.0216999998 us - 60.73737315655672%377408.39589999994 us - 47.27166909523449%506978.10549999995 us - 56.982107111140046%433147.3443000001 us - 58.51852422287182%
SUM791537.3956999998 us798381.7858000002 us889714.5634 us740188.4276000002
[3] D_1 compare each step inside
[Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)1754.3673999999999 us - 1.6860113426598728%2320.3632000000002 us - 1.489795195953362%1825.1960999999988 us - 1.548613686329812%1607.3271999999997 us - 1.758246074195737%
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)100527.56879999992 us - 96.61067644486589%151532.73299999998 us - 97.29198327791244%113686.73430000001 us - 96.4591326329927%87212.82 us - 95.4016073295714%
[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)431.9958999999999 us - 0.41516388607230165%468.2119000000003 us - 0.30061666178303303%474.28859999999975 us - 0.4024169332984032%663.3730000000002 us - 0.7256599483773118%
[Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)1340.373799999999 us - 1.288148326401935%1429.1740000000002 us - 0.9176048643511714%1873.7816000000007 us - 1.5898367473790769%1932.989900000001 us - 2.1144866478555286%
[3] D_
1
2 compare each step inside
[Avg&Per] (D-
1
2)
7
8_1_
data_ByteBuffer_to_ByteArray
createBatchData(us)
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)
3348.9259 us - 0.2218271110091239%3442.3069 us - 0.24318767981262365%3466.6887 us - 0.22277034162258696%3338.6453 us - 0.22194982158403212%
[Avg&Per] (D-
1
2)
7
8_
4
2_
data_split_time_value_Buffer
timeDecoder_hasNext(us)
[3] D_2 compare each step inside[Avg&Per] (D-2)8_1_createBatchData(us)
235317.741 us - 15.587043790733997%244839.8323 us - 17.29713023052909%239308.8755 us - 15.37805224577913%240223.005 us - 15.969786637750962%
[Avg&Per] (D-2)8_
2
3_timeDecoder_
hasNext
readLong(us)361562.5392 us - 23.949282819264262%363364.4154 us - 25.67050285597614%362143.5815 us - 23.271443255687238%357833.1113 us - 23.788389623148674%
[Avg&Per] (D-2)8_
3
4_
timeDecoder
valueDecoder_
readLong
read(us)
356325.4946 us - 23.602389962111484%241625.5526 us - 17.07005192367858%396438.5566 us - 25.475247513902037%335421.7093 us - 22.298501890735768%
[Avg&Per] (D-2)8_4_valueDecoder_read(us)
[Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)225554.2633 us - 14.940327764084076%232364.86419999998 us - 16.415814695306036%224510.63450000001 us - 14.4271133273284%230332.4743 us - 15.312273986066641%
[Avg&Per] (D-2)8_6_putIntoBatchData(us)327591.9399 us - 21.699128552797042%329856.9807 us - 23.30331261469754%330303.223 us - 21.225373315680617%337085.3345 us - 22.409098040713936%


人工数据实验结果

改变压缩方式

RLCompressionSynExpScripts.sh

...

  • B类操作耗时超过D类操作耗时,分析原因:使用的人工数据数值是INT64类型的随机取整的数,且PLAIN编码,且四种压缩方式此时的压缩率都不高,所以磁盘数据量偏大。
  • D-1类操作内部7_1_data_ByteBuffer_to_ByteArray占比偏大,分析原因:人工数据压缩率低,解压缩7_2_data_decompress_PageDataByteArray耗时相对少,从而7_1_data_ByteBuffer_to_ByteArray相对占比变大。

...

改变值列编码方式

RLValueEncodingSynExpScripts.sh

...