Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

压缩方式GZIPLZ4SNAPPYUNCOMPRESSED
datasetsyntheticsyntheticsyntheticsynthetic
pagePointNum(ppn)10000100001000010000
numOfPagesInChunk(pic)1000100010001000
chunksWritten(cw)10101010
timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
valueDataType(vt)INT64INT64INT64INT64
valueEncoding(ve)PLAINPLAINPLAINPLAIN
compression(co)GZIPLZ4SNAPPYUNCOMPRESSED
totalPointNum100000000100000000100000000100000000
tsfileSize(MB)767.1312866770.8444319767.9423904781.4226151
chunkDataSize_stats_mean(MB)76.7130076177.0843643676.794126478.14216614
compressedPageSize_stats_mean(B)80375.4186780764.8144480460.4778981874
uncompressedPageSize_stats_mean(B)81874818748187481874
timeBufferSize_stats_mean(B)1872187218721872
valueBufferSize_stats_mean(B)80000800008000080000
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
[Avg&Per] (A)get_chunkMetadatas101654.7566 us - 0.9318259204986696%86735.1136 us - 1.1689451651044902%82918.49919999999 us - 1.0656107165107132%93896.1409 us - 1.2531889263513831%
[Avg&Per] (B)load_on_disk_chunk5949593.8917000005 us - 54.53739687304131%4849630.197000001 us - 65.35936296194448%5086618.179 us - 65.36966894765757%4851386.484999999 us - 64.74924062031137%
[Avg&Per] (C)get_pageHeader6613.381399999991 us - 0.060622054656159316%7120.158900000014 us - 0.09595969816001626%7692.346500000014 us - 0.09885667184764595%7605.696300000007 us - 0.10150975630087579%
[Avg&Per] (D_1)decompress_pageData2859428.0031000106 us - 26.211160404158996%521804.2723000004 us - 7.032452670194604%605202.8143000009 us - 7.777644443672276%498170.42259999976 us - 6.648853201570805%
[Avg&Per] (D_2)decode_pageData1991910.319299996 us - 18.25899474764487%1954657.4198999994 us - 26.343279504596413%1998880.5966999987 us - 25.68821922031179%2041517.9070999995 us - 27.247207495465567%
[3] D_1 compare each step inside
[Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)65952.37819999998 us - 2.5136549346918415%108809.34350000018 us - 59.6896105506002%108132.35939999981 us - 43.622622294156905%110765.11740000003 us - 63.813731447511664%
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)2554687.926599999 us - 97.36728361519128%68904.38600000006 us - 37.79892271446546%135345.91170000008 us - 54.601079805416944%57547.215800000035 us - 33.15396273496119%
[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)811.8335000000624 us - 0.03094155721322126%1239.949800000022 us - 0.680200047933345%1184.7460000000272 us - 0.47794876167766753%1229.1439000000355 us - 0.7081314098345313%
[Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)2312.0582000000422 us - 0.08811989290364733%3338.2513999999974 us - 1.8312666870009688%3218.3657999999955 us - 1.29834913874849%4034.201500000018 us - 2.3241744076926314%
[3] D_2 compare each step inside
[Avg&Per] (D-2)8_1_createBatchData(us)5384.7852 us - 0.053292019060348375%5848.7599 us - 0.05759123169122766%5913.4963 us - 0.058362326692940975%6019.3023 us - 0.05943520403215091%
[Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)1859842.2956 us - 18.406444711361424%1862234.7849 us - 18.336946086748988%1864092.3926 us - 18.397368271414525%1857778.6739 us - 18.343895858133802%
[Avg&Per] (D-2)8_3_timeDecoder_readLong(us)2074757.7936 us - 20.533415498567617%2084700.4377 us - 20.527508047369906%2063043.8916 us - 20.360888969091857%2069930.4964 us - 20.43870456313607%
[Avg&Per] (D-2)8_4_valueDecoder_read(us)1876012.952 us - 18.56648209392724%1881471.5433999998 us - 18.526365490982297%1877809.2412 us - 18.532744562964893%1876843.1276 us - 18.53214021585961%
[Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)1780379.6374 us - 17.620020492363725%1780782.3133 us - 17.534904586680103%1781949.2049 us - 17.586668929952697%1780599.5789 us - 17.5818216126948%
[Avg&Per] (D-2)8_6_putIntoBatchData(us)2507922.0072 us - 24.82034518471963%2540605.1784 us - 25.016684556527476%2539577.912 us - 25.063966939883077%2536332.2055 us - 25.044002546143567%


B类操作耗时超过D类操作耗时。分析原因:使用的人工数据数值是INT64类型的随机取整的数,且PLAIN编码,且四种压缩方式此时的压缩率都不高,所以磁盘数据量偏大。

...

压缩方式GZIPLZ4SNAPPYUNCOMPRESSED
dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv
pagePointNum(ppn)10000100001000010000
numOfPagesInChunk(pic)100100100100
chunksWritten(cw)13131313
timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
valueDataType(vt)DOUBLEDOUBLEDOUBLEDOUBLE
valueEncoding(ve)GORILLAGORILLAGORILLAGORILLA
compression(co)GZIPLZ4SNAPPYUNCOMPRESSED
totalPointNum12780287127802871278028712780287
tsfileSize(MB)19.3486261423.7774105123.1564121236.30773735
chunkDataSize_stats_mean(MB)1.5151396591.860070071.8131843412.837062438
compressedPageSize_stats_mean(B)15824.1666719440.27518948.6929684.7575
uncompressedPageSize_stats_mean(B)29684.757529684.757529684.757529684.7575
timeBufferSize_stats_mean(B)11461.462511461.462511461.462511461.4625
valueBufferSize_stats_mean(B)18221.2683318221.2683318221.2683318221.26833
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
[Avg&Per] (A)get_chunkMetadatas87316.85010000001 us - 8.19883087518919%100289.6416 us - 12.59103802007131%89466.6045 us - 11.199530364576052%88760.777 us - 10.335699109318087%
[Avg&Per] (B)load_on_disk_chunk160699.6025 us - 15.089285299443361%176784.88239999997 us - 22.19476647997349%105802.9384 us - 13.244531050378352%191518.85280000002 us - 22.301305860612082%
[Avg&Per] (C)get_pageHeader2436.5129999999995 us - 0.22878239411203669%2198.3668000000007 us - 0.27599779517870476%2319.9517000000005 us - 0.29041416798711567%3134.228800000001 us - 0.3649635223062446%
[Avg&Per] (D_1)decompress_pageData356587.5454999999 us - 33.482666568996265%31158.160799999983 us - 3.911805656191469%115629.7179 us - 14.474658381255693%29640.983400000016 us - 3.4515277590088287%
[Avg&Per] (D_2)decode_pageData457950.96670000016 us - 43.000434862259155%486085.0215000002 us - 61.02639204858503%485623.25309999986 us - 60.790866035802786%545723.8052999998 us - 63.54650374875476%
[3] D_1 compare each step inside
[Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)1269.7615999999996 us - 0.36377892330196115%1808.5776999999991 us - 7.207505165727991%1687.9916000000003 us - 1.412377327892232%3197.6313 us - 41.86667970732365%
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)345856.23130000004 us - 99.08569249502276%21247.39440000001 us - 84.67466169480036%116100.69619999993 us - 97.14384305311928%2432.4552000000012 us - 31.84820675254647%
[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)374.18470000000025 us - 0.10720162531460037%442.10720000000003 us - 1.761876157051776%424.26030000000026 us - 0.35498732863644405%421.1930000000002 us - 5.51469221169019%
[Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)1547.4220999999993 us - 0.443326956360674%1594.8989000000004 us - 6.355956982419887%1301.2614999999998 us - 1.0887922903520593%1586.372500000001 us - 20.770421328439692%
[3] D_2 compare each step inside
[Avg&Per] (D-2)8_1_createBatchData(us)3430.8672 us - 0.22855030377902266%4174.8958 us - 0.275155541260775%3438.1465 us - 0.22874736939899784%3730.6205 us - 0.24721094086707432%
[Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)234016.77980000002 us - 15.589238228946503%236135.9462 us - 15.56305048087338%235278.4135 us - 15.65358490817499%234469.6457 us - 15.537217392727715%
[Avg&Per] (D-2)8_3_timeDecoder_readLong(us)357893.9434 us - 23.841426880277478%360253.1425 us - 23.743262865502565%358341.1524 us - 23.84121675993312%363063.0738 us - 24.058508247673558%
[Avg&Per] (D-2)8_4_valueDecoder_read(us)353821.6809 us - 23.570149451806063%359477.8899 us - 23.692168165422427%356440.4841 us - 23.714761161335133%355773.1939 us - 23.5754416723178%
[Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)223758.3096 us - 14.905861011513531%224938.2721 us - 14.825043539994219%224282.1638 us - 14.921980483485841%226053.6587 us - 14.979528915812129%
[Avg&Per] (D-2)8_6_putIntoBatchData(us)328221.5562 us - 21.864774123677403%332305.597 us - 21.90131940694663%325251.7878 us - 21.639709317671908%325993.7043 us - 21.602092830601723%


  • 相对其它压缩方法,GZIP的压缩率最高,同时它的D-1解压缩步骤耗时占比也更高。
  • 真实数据集的压缩率高,磁盘数据量少,D类操作耗时超过B类操作耗时,即整体耗时瓶颈在D类操作。
  • 真实数据集的压缩率高,D-1步骤内部的主要耗时瓶颈是子步骤7_2_data_decompress_PageDataByteArray(us)。
    • 人工数据实验里发现另一个子步骤7_1_data_ByteBuffer_to_ByteArray(us)的占比也高,其主要因为人工数据实验里数据压缩率很低,从而子步骤7_2_data_decompress_PageDataByteArray(us)耗时少,从而相对来说7_1_data_ByteBuffer_to_ByteArray(us)占比高了。
  • 本实验里,D-2类操作内部没有突出的耗时瓶颈子步骤。

...