一、实验设置
实验目的
- 读一个TsFile的磁盘IO代价和CPU代价的大小比较
- 了解CPU代价里有没有突出的耗时瓶颈操作
IoTDB版本
- v0.13.1
实验环境
- CPU:Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz,(6核12线程)
- L1 cache 284KB, L2 cache 1536KB, L3 cache 12MB
- 内存:16G
- 硬盘:1.8T HDD /dev/sdb1 mounted on /disk
- 操作系统:Ubuntu 16.04.7 LTS
RLTsFileReadCostBench用法
(1) 用人工数据写TsFile
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar WRITE_SYN [pagePointNum] [numOfPagesInChunk] [chunksWritten] [timeEncoding] [valueDataType] [valueEncoding] [compressionType]
:用来标识是“写人工数据/写真实数据/读数据”中的“写人工数据”
pagePointNum(ppn)
:一个page内的点数numOfPagesInChunk(pic)
:一个chunk内的pages数chunksWritten(cw)
:写的chunks总数timeEncoding(te)
:时间戳列编码方式valueDataType(vt)
:值列数据类型valueEncoding(ve)
:值列编码方式compressionType(co)
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar WRITE_REAL [path_of_real_data_csv_to_write] [pagePointNum] [numOfPagesInChunk] [timeEncoding] [valueDataType] [valueEncoding] [compressionType]
:用来标识是“写人工数据/写真实数据/读数据”中的“写真实数据”
path_of_real_data_csv_to_write
:用来写TsFile的真实数据集csv地址pagePointNum(ppn)
:一个page内的点数numOfPagesInChunk(pic)
:一个chunk内的pages数timeEncoding(te)
:时间戳列编码方式valueDataType(vt)
:值列数据类型valueEncoding(ve)
:值列编码方式compressionType(co)
(3) 读实验
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar READ [path_of_tsfile_to_read] [decomposeMeasureTime] [D_decompose_each_step] (timeEncoding)
:用来标识是“写人工数据/写真实数据/读数据”中的“读数据”
path_of_tsfile_to_read
:要读取的TsFile地址decomposeMeasureTime
:FALSE
to measure the read process as a whole, in which caseD_decompose_each_step
is useless.TRUE
to measure the decomposed read process, and the decomposition granularity is controlled byD_decompose_each_step
.D_decompose_each_step
:WhendecomposeMeasureTime
isTRUE
,D_decompose_each_step=FALSE
to measure the "(D_1)decompress_pageData" and "(D_2)decode_pageData" steps without further deomposition,D_decompose_each_step=TRUE
to break down these two steps further and measure substeps inside.timeEncoding(te)
:IftimeEncoding
is not specified, TS_2DIFF will be used by default.timeEncoding
控制参数 | decomposeMeasureTime=FALSE | decomposeMeasureTime=TRUE & D_decompose_each_step=FALSE | |
---|---|---|---|
测量最小单元步骤 | total_time(us) |
|
|
结合实验目的,
- 目的1:decomposeMeasureTime=TRUE & D_decompose_each_step=FALSE:对比B类操作(磁盘IO代价所在)和D类操作(CPU代价主要所在)耗时
- 目的2:decomposeMeasureTime=TRUE & D_decompose_each_step=TRUE:分析D-1操作内部各子步骤耗时占比,分析D-2操作内部各子步骤耗时占比
自动化脚本
- 输出:一个TsFile文件、一个TsFile空间统计结果文件(
*writeResult.csv
)、一个读TsFile耗时结果csv文件(*readResult-T*csv
输入:见RLTsFileReadCostBench的写数据参数和读数据参数
输出:一个TsFile文件、一个TsFile空间统计结果文件(
*writeResult.csv
)、一个读TsFile耗时结果csv文件(*readResult-T*csv
WRITE_READ_JAR_PATH
:RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar的地址Calculator_JAR_PATH
:把若干次重复读实验结果进行平均值和百分比计算的RLRepeatReadResultAvgPercCalculator-0.13.1-jar-with-dependencies.jar的地址FILE_NAME
:要读取的TsFile的地址decomposeMeasureTime
:见RLTsFileReadCostBench读数据参数D_decompose_each_step
:见RLTsFileReadCostBench读数据参数te
::见RLTsFileReadCostBench读数据参数REPEAT
:读实验重复次数
输出:
REPEAT个读TsFile耗时结果csv文件
*readResult-T*csv
一个把重复读实验结果横向拼接起来的csv文件
*readResult-combined.csv
一个把写结果和读结果拼接起来的csv文件
*allResult-combined.csv
一个把读结果取平均值并且按照不同粒度统计百分比的csv文件
工具地址:
WRITE_READ_JAR_PATH
:RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar的地址Calculator_JAR_PATH
:把若干次重复读实验结果进行平均值和百分比统计计算的RLRepeatReadResultAvgPercCalculator-0.13.1-jar-with-dependencies.jar的地址TOOL_PATH
:用于替换脚本中变量值的自动脚本工具RLtool.sh的地址READ_SCRIPT_PATH
:RLReadExpScripts.sh的地址
写数据参数:见RLTsFileReadCostBench写数据参数
读数据参数:见RLTsFileReadCostBench读数据参数
REPEAT
:读实验重复次数
输出:不同压缩方式下的一个TsFile文件、一个TsFile空间统计结果文件(
*writeResult.csv
)、REPEAT个读TsFile耗时结果csv文件(*readResult-T*csv
)、一个把重复读实验结果横向拼接起来的csv文件(*readResult-combined.csv
)、一个把写结果和读结果拼接起来的csv文件(*allResult-combined.csv
)、一个把读结果取平均值并且按照不同粒度统计百分比的csv文件(*allResult-combined-processed.csv
其它类似,不再赘述。
二、实验结果
改变压缩方式
人工数据实验结果
RLCompressionSynExpScripts.sh
压缩方式 | GZIP | LZ4 | SNAPPY | UNCOMPRESSED |
dataset | synthetic | synthetic | synthetic | synthetic |
pagePointNum(ppn) | 10000 | 10000 | 10000 | 10000 |
numOfPagesInChunk(pic) | 1000 | 1000 | 1000 | 1000 |
chunksWritten(cw) | 10 | 10 | 10 | 10 |
timeEncoding(te) | TS_2DIFF | TS_2DIFF | TS_2DIFF | TS_2DIFF |
valueDataType(vt) | INT64 | INT64 | INT64 | INT64 |
valueEncoding(ve) | PLAIN | PLAIN | PLAIN | PLAIN |
compression(co) | GZIP | LZ4 | SNAPPY | UNCOMPRESSED |
totalPointNum | 100000000 | 100000000 | 100000000 | 100000000 |
tsfileSize(MB) | 767.1312866 | 770.8444319 | 767.9423904 | 781.4226151 |
chunkDataSize_stats_mean(MB) | 76.71300761 | 77.08436436 | 76.7941264 | 78.14216614 |
compressedPageSize_stats_mean(B) | 80375.41867 | 80764.81444 | 80460.47789 | 81874 |
uncompressedPageSize_stats_mean(B) | 81874 | 81874 | 81874 | 81874 |
timeBufferSize_stats_mean(B) | 1872 | 1872 | 1872 | 1872 |
valueBufferSize_stats_mean(B) | 80000 | 80000 | 80000 | 80000 |
[1] each step | ||||
[Avg&Per] (A)1_index_read_deserialize_MagicString_FileMetadataSize(us) | 26642.8733 us - 0.24422388846191903% | 11918.6528 us - 0.16062988779113208% | 10188.2737 us - 0.1309325873262339% | 10953.8906 us - 0.14619657707769018% |
[Avg&Per] (A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us) | 5777.7715 us - 0.05296237408352104% | 5484.9663 us - 0.07392190510886776% | 5140.9507 us - 0.0660679126109081% | 6219.5857 us - 0.08300997092132265% |
[Avg&Per] (A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us) | 69234.1118 us - 0.6346396579532295% | 69331.4945 us - 0.9343933722044904% | 67589.2748 us - 0.8686102165735712% | 76722.6646 us - 1.0239823783523703% |
[Avg&Per] (B)4_data_read_deserialize_ChunkHeader(us) | 8684.7625 us - 0.07960952425196037% | 10008.712599999999 us - 0.13488927052826724% | 4487.9059 us - 0.05767543633654741% | 7069.0819 us - 0.09434780888370882% |
[Avg&Per] (B)5_data_read_ChunkData(us) | 5940909.1292 us - 54.457787348789346% | 4839621.4844 us - 65.22447369141621% | 5082130.2731 us - 65.31199351132103% | 4844317.4031 us - 64.65489281142766% |
[Avg&Per] (C)6_data_deserialize_PageHeader(us) | 6613.381399999991 us - 0.060622054656159316% | 7120.158900000014 us - 0.09595969816001626% | 7692.346500000014 us - 0.09885667184764595% | 7605.696300000007 us - 0.10150975630087579% |
[Avg&Per] (D-1)7_data_decompress_PageData(us) | 2859428.0031000106 us - 26.211160404158996% | 521804.2723000004 us - 7.032452670194604% | 605202.8143000009 us - 7.777644443672276% | 498170.42259999976 us - 6.648853201570805% |
[Avg&Per] (D-2)8_data_decode_PageData(us) | 1991910.319299996 us - 18.25899474764487% | 1954657.4198999994 us - 26.343279504596413% | 1998880.5966999987 us - 25.68821922031179% | 2041517.9070999995 us - 27.247207495465567% |
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData | ||||
[Avg&Per] (A)get_chunkMetadatas | 101654.7566 us - 0.9318259204986696% | 86735.1136 us - 1.1689451651044902% | 82918.49919999999 us - 1.0656107165107132% | 93896.1409 us - 1.2531889263513831% |
[Avg&Per] (B)load_on_disk_chunk | 5949593.8917000005 us - 54.53739687304131% | 4849630.197000001 us - 65.35936296194448% | 5086618.179 us - 65.36966894765757% | 4851386.484999999 us - 64.74924062031137% |
[Avg&Per] (C)get_pageHeader | 6613.381399999991 us - 0.060622054656159316% | 7120.158900000014 us - 0.09595969816001626% | 7692.346500000014 us - 0.09885667184764595% | 7605.696300000007 us - 0.10150975630087579% |
[Avg&Per] (D_1)decompress_pageData | 2859428.0031000106 us - 26.211160404158996% | 521804.2723000004 us - 7.032452670194604% | 605202.8143000009 us - 7.777644443672276% | 498170.42259999976 us - 6.648853201570805% |
[Avg&Per] (D_2)decode_pageData | 1991910.319299996 us - 18.25899474764487% | 1954657.4198999994 us - 26.343279504596413% | 1998880.5966999987 us - 25.68821922031179% | 2041517.9070999995 us - 27.247207495465567% |
[3] D_1 compare each step inside | ||||
[Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us) | 65952.37819999998 us - 2.5136549346918415% | 108809.34350000018 us - 59.6896105506002% | 108132.35939999981 us - 43.622622294156905% | 110765.11740000003 us - 63.813731447511664% |
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us) | 2554687.926599999 us - 97.36728361519128% | 68904.38600000006 us - 37.79892271446546% | 135345.91170000008 us - 54.601079805416944% | 57547.215800000035 us - 33.15396273496119% |
[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us) | 811.8335000000624 us - 0.03094155721322126% | 1239.949800000022 us - 0.680200047933345% | 1184.7460000000272 us - 0.47794876167766753% | 1229.1439000000355 us - 0.7081314098345313% |
[Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us) | 2312.0582000000422 us - 0.08811989290364733% | 3338.2513999999974 us - 1.8312666870009688% | 3218.3657999999955 us - 1.29834913874849% | 4034.201500000018 us - 2.3241744076926314% |
[3] D_2 compare each step inside | ||||
[Avg&Per] (D-2)8_1_createBatchData(us) | 5384.7852 us - 0.053292019060348375% | 5848.7599 us - 0.05759123169122766% | 5913.4963 us - 0.058362326692940975% | 6019.3023 us - 0.05943520403215091% |
[Avg&Per] (D-2)8_2_timeDecoder_hasNext(us) | 1859842.2956 us - 18.406444711361424% | 1862234.7849 us - 18.336946086748988% | 1864092.3926 us - 18.397368271414525% | 1857778.6739 us - 18.343895858133802% |
[Avg&Per] (D-2)8_3_timeDecoder_readLong(us) | 2074757.7936 us - 20.533415498567617% | 2084700.4377 us - 20.527508047369906% | 2063043.8916 us - 20.360888969091857% | 2069930.4964 us - 20.43870456313607% |
[Avg&Per] (D-2)8_4_valueDecoder_read(us) | 1876012.952 us - 18.56648209392724% | 1881471.5433999998 us - 18.526365490982297% | 1877809.2412 us - 18.532744562964893% | 1876843.1276 us - 18.53214021585961% |
[Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us) | 1780379.6374 us - 17.620020492363725% | 1780782.3133 us - 17.534904586680103% | 1781949.2049 us - 17.586668929952697% | 1780599.5789 us - 17.5818216126948% |
[Avg&Per] (D-2)8_6_putIntoBatchData(us) | 2507922.0072 us - 24.82034518471963% | 2540605.1784 us - 25.016684556527476% | 2539577.912 us - 25.063966939883077% | 2536332.2055 us - 25.044002546143567% |
RLCompressionRealExpScripts.sh
ZT11529传感器数据如下图所示,共12,780,287个点。
压缩方式 | GZIP | LZ4 | SNAPPY | UNCOMPRESSED |
dataset | /disk/rl/zc_data/ZT11529.csv | /disk/rl/zc_data/ZT11529.csv | /disk/rl/zc_data/ZT11529.csv | /disk/rl/zc_data/ZT11529.csv |
pagePointNum(ppn) | 10000 | 10000 | 10000 | 10000 |
numOfPagesInChunk(pic) | 100 | 100 | 100 | 100 |
chunksWritten(cw) | 13 | 13 | 13 | 13 |
timeEncoding(te) | TS_2DIFF | TS_2DIFF | TS_2DIFF | TS_2DIFF |
valueDataType(vt) | DOUBLE | DOUBLE | DOUBLE | DOUBLE |
valueEncoding(ve) | GORILLA | GORILLA | GORILLA | GORILLA |
compression(co) | GZIP | LZ4 | SNAPPY | UNCOMPRESSED |
totalPointNum | 12780287 | 12780287 | 12780287 | 12780287 |
tsfileSize(MB) | 19.34862614 | 23.77741051 | 23.15641212 | 36.30773735 |
chunkDataSize_stats_mean(MB) | 1.515139659 | 1.86007007 | 1.813184341 | 2.837062438 |
compressedPageSize_stats_mean(B) | 15824.16667 | 19440.275 | 18948.69 | 29684.7575 |
uncompressedPageSize_stats_mean(B) | 29684.7575 | 29684.7575 | 29684.7575 | 29684.7575 |
timeBufferSize_stats_mean(B) | 11461.4625 | 11461.4625 | 11461.4625 | 11461.4625 |
valueBufferSize_stats_mean(B) | 18221.26833 | 18221.26833 | 18221.26833 | 18221.26833 |
[1] each step | ||||
[Avg&Per] (A)1_index_read_deserialize_MagicString_FileMetadataSize(us) | 10294.8866 us - 0.9666637540862395% | 17480.267 us - 2.1945906166045948% | 10013.4112 us - 1.2534900973847278% | 12118.2366 us - 1.4111012934589997% |
[Avg&Per] (A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us) | 5024.5339 us - 0.4717909959598364% | 8883.9004 us - 1.115344774578661% | 4736.353 us - 0.5929020055841159% | 5782.8723 us - 0.6733833355290506% |
[Avg&Per] (A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us) | 71997.4296 us - 6.760376125143112% | 73925.4742 us - 9.281102628888053% | 74716.8403 us - 9.353138261607208% | 70859.6681 us - 8.251214480330036% |
[Avg&Per] (B)4_data_read_deserialize_ChunkHeader(us) | 8457.0026 us - 0.7940911055429293% | 4862.6532 us - 0.6104902793831642% | 4758.252799999999 us - 0.5956434472253724% | 2203.0548 us - 0.25653348589027036% |
[Avg&Per] (B)5_data_read_ChunkData(us) | 152242.5999 us - 14.295194193900432% | 171922.22919999997 us - 21.584276200590327% | 101044.6856 us - 12.648887603152982% | 189315.798 us - 22.04477237472181% |
[Avg&Per] (C)6_data_deserialize_PageHeader(us) | 2436.5129999999995 us - 0.22878239411203669% | 2198.3668000000007 us - 0.27599779517870476% | 2319.9517000000005 us - 0.29041416798711567% | 3134.228800000001 us - 0.3649635223062446% |
[Avg&Per] (D-1)7_data_decompress_PageData(us) | 356587.5454999999 us - 33.482666568996265% | 31158.160799999983 us - 3.911805656191469% | 115629.7179 us - 14.474658381255693% | 29640.983400000016 us - 3.4515277590088287% |
[Avg&Per] (D-2)8_data_decode_PageData(us) | 457950.96670000016 us - 43.000434862259155% | 486085.0215000002 us - 61.02639204858503% | 485623.25309999986 us - 60.790866035802786% | 545723.8052999998 us - 63.54650374875476% |
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData | ||||
[Avg&Per] (A)get_chunkMetadatas | 87316.85010000001 us - 8.19883087518919% | 100289.6416 us - 12.59103802007131% | 89466.6045 us - 11.199530364576052% | 88760.777 us - 10.335699109318087% |
[Avg&Per] (B)load_on_disk_chunk | 160699.6025 us - 15.089285299443361% | 176784.88239999997 us - 22.19476647997349% | 105802.9384 us - 13.244531050378352% | 191518.85280000002 us - 22.301305860612082% |
[Avg&Per] (C)get_pageHeader | 2436.5129999999995 us - 0.22878239411203669% | 2198.3668000000007 us - 0.27599779517870476% | 2319.9517000000005 us - 0.29041416798711567% | 3134.228800000001 us - 0.3649635223062446% |
[Avg&Per] (D_1)decompress_pageData | 356587.5454999999 us - 33.482666568996265% | 31158.160799999983 us - 3.911805656191469% | 115629.7179 us - 14.474658381255693% | 29640.983400000016 us - 3.4515277590088287% |
[Avg&Per] (D_2)decode_pageData | 457950.96670000016 us - 43.000434862259155% | 486085.0215000002 us - 61.02639204858503% | 485623.25309999986 us - 60.790866035802786% | 545723.8052999998 us - 63.54650374875476% |
[3] D_1 compare each step inside | ||||
[Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us) | 1269.7615999999996 us - 0.36377892330196115% | 1808.5776999999991 us - 7.207505165727991% | 1687.9916000000003 us - 1.412377327892232% | 3197.6313 us - 41.86667970732365% |
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us) | 345856.23130000004 us - 99.08569249502276% | 21247.39440000001 us - 84.67466169480036% | 116100.69619999993 us - 97.14384305311928% | 2432.4552000000012 us - 31.84820675254647% |
[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us) | 374.18470000000025 us - 0.10720162531460037% | 442.10720000000003 us - 1.761876157051776% | 424.26030000000026 us - 0.35498732863644405% | 421.1930000000002 us - 5.51469221169019% |
[Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us) | 1547.4220999999993 us - 0.443326956360674% | 1594.8989000000004 us - 6.355956982419887% | 1301.2614999999998 us - 1.0887922903520593% | 1586.372500000001 us - 20.770421328439692% |
[3] D_2 compare each step inside | ||||
[Avg&Per] (D-2)8_1_createBatchData(us) | 3430.8672 us - 0.22855030377902266% | 4174.8958 us - 0.275155541260775% | 3438.1465 us - 0.22874736939899784% | 3730.6205 us - 0.24721094086707432% |
[Avg&Per] (D-2)8_2_timeDecoder_hasNext(us) | 234016.77980000002 us - 15.589238228946503% | 236135.9462 us - 15.56305048087338% | 235278.4135 us - 15.65358490817499% | 234469.6457 us - 15.537217392727715% |
[Avg&Per] (D-2)8_3_timeDecoder_readLong(us) | 357893.9434 us - 23.841426880277478% | 360253.1425 us - 23.743262865502565% | 358341.1524 us - 23.84121675993312% | 363063.0738 us - 24.058508247673558% |
[Avg&Per] (D-2)8_4_valueDecoder_read(us) | 353821.6809 us - 23.570149451806063% | 359477.8899 us - 23.692168165422427% | 356440.4841 us - 23.714761161335133% | 355773.1939 us - 23.5754416723178% |
[Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us) | 223758.3096 us - 14.905861011513531% | 224938.2721 us - 14.825043539994219% | 224282.1638 us - 14.921980483485841% | 226053.6587 us - 14.979528915812129% |
[Avg&Per] (D-2)8_6_putIntoBatchData(us) | 328221.5562 us - 21.864774123677403% | 332305.597 us - 21.90131940694663% | 325251.7878 us - 21.639709317671908% | 325993.7043 us - 21.602092830601723% |
- 相对其它压缩方法,GZIP的压缩率最高,同时它的D-1解压缩步骤耗时占比也更高。
- 真实数据集的压缩率高,磁盘数据量少,
- 真实数据集的压缩率高,D-1步骤内部的主要耗时瓶颈是子步骤7_2_data_decompress_PageDataByteArray(us)。
- 人工数据实验里发现另一个子步骤7_1_data_ByteBuffer_to_ByteArray(us)的占比也高,其主要因为人工数据实验里数据压缩率很低,从而子步骤7_2_data_decompress_PageDataByteArray(us)耗时少,从而相对来说7_1_data_ByteBuffer_to_ByteArray(us)占比高了。
改变编码方式
人工数据实验结果
RLValueEncodingSynExpScripts.sh
RLValueEncodingRealExpScripts.sh