一、实验设置
实验目的
- 读一个TsFile的磁盘IO代价和CPU代价的大小比较
- 了解CPU代价里有没有突出的耗时瓶颈操作
IoTDB版本
- v0.13.1
实验环境
- CPU:Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz,(6核12线程)
- L1 cache 284KB, L2 cache 1536KB, L3 cache 12MB
- 内存:16G
- 硬盘:1.8T HDD /dev/sdb1 mounted on /disk
- 操作系统:Ubuntu 16.04.7 LTS
RLTsFileReadCostBench用法
(1) 用人工数据写TsFile
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar WRITE_SYN [pagePointNum] [numOfPagesInChunk] [chunksWritten] [timeEncoding] [valueDataType] [valueEncoding] [compressionType]
:用来标识是“写人工数据/写真实数据/读数据”中的“写人工数据”
pagePointNum(ppn)
:一个page内的点数numOfPagesInChunk(pic)
:一个chunk内的pages数chunksWritten(cw)
:写的chunks总数timeEncoding(te)
:时间戳列编码方式valueDataType(vt)
:值列数据类型valueEncoding(ve)
:值列编码方式compressionType(co)
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar WRITE_REAL [path_of_real_data_csv_to_write] [pagePointNum] [numOfPagesInChunk] [timeEncoding] [valueDataType] [valueEncoding] [compressionType]
:用来标识是“写人工数据/写真实数据/读数据”中的“写真实数据”
path_of_real_data_csv_to_write
:用来写TsFile的真实数据集csv地址pagePointNum(ppn)
:一个page内的点数numOfPagesInChunk(pic)
:一个chunk内的pages数timeEncoding(te)
:时间戳列编码方式valueDataType(vt)
:值列数据类型valueEncoding(ve)
:值列编码方式compressionType(co)
(3) 读实验
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar READ [path_of_tsfile_to_read] [decomposeMeasureTime] [D_decompose_each_step] (timeEncoding)
:用来标识是“写人工数据/写真实数据/读数据”中的“读数据”
path_of_tsfile_to_read
:要读取的TsFile地址decomposeMeasureTime
:FALSE
to measure the read process as a whole, in which caseD_decompose_each_step
is useless.TRUE
to measure the decomposed read process, and the decomposition granularity is controlled byD_decompose_each_step
.D_decompose_each_step
:WhendecomposeMeasureTime
isTRUE
,D_decompose_each_step=FALSE
to measure the "(D_1)decompress_pageData" and "(D_2)decode_pageData" steps without further deomposition,D_decompose_each_step=TRUE
to break down these two steps further and measure substeps inside.timeEncoding(te)
:IftimeEncoding
is not specified, TS_2DIFF will be used by default.timeEncoding
控制参数 | decomposeMeasureTime=FALSE | decomposeMeasureTime=TRUE & D_decompose_each_step=FALSE | |
---|---|---|---|
测量最小单元步骤 | total_time(us) |
|
|
结合实验目的,
- 目的1:decomposeMeasureTime=TRUE & D_decompose_each_step=FALSE:对比B类操作(磁盘IO代价所在)和D类操作(CPU代价主要所在)耗时
- 目的2:decomposeMeasureTime=TRUE & D_decompose_each_step=TRUE:分析D-1操作内部各子步骤耗时占比,分析D-2操作内部各子步骤耗时占比
自动化脚本
- 输出:一个TsFile文件、一个TsFile空间统计结果文件(
*writeResult.csv
)、一个读TsFile耗时结果csv文件(*readResult-T*csv
输入:见RLTsFileReadCostBench的写数据参数和读数据参数
输出:一个TsFile文件、一个TsFile空间统计结果文件(
*writeResult.csv
)、一个读TsFile耗时结果csv文件(*readResult-T*csv
WRITE_READ_JAR_PATH
:RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar的地址Calculator_JAR_PATH
:把若干次重复读实验结果进行平均值和百分比计算的RLRepeatReadResultAvgPercCalculator-0.13.1-jar-with-dependencies.jar的地址FILE_NAME
:要读取的TsFile的地址decomposeMeasureTime
:见RLTsFileReadCostBench读数据参数D_decompose_each_step
:见RLTsFileReadCostBench读数据参数te
::见RLTsFileReadCostBench读数据参数REPEAT
:读实验重复次数
输出:
REPEAT个读TsFile耗时结果csv文件
*readResult-T*csv
一个把重复读实验结果横向拼接起来的csv文件
*readResult-combined.csv
一个把写结果和读结果拼接起来的csv文件
*allResult-combined.csv
一个把读结果取平均值并且按照不同粒度统计百分比的csv文件
工具地址:
WRITE_READ_JAR_PATH
:RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar的地址Calculator_JAR_PATH
:把若干次重复读实验结果进行平均值和百分比统计计算的RLRepeatReadResultAvgPercCalculator-0.13.1-jar-with-dependencies.jar的地址TOOL_PATH
:用于替换脚本中变量值的自动脚本工具RLtool.sh的地址READ_SCRIPT_PATH
:RLReadExpScripts.sh的地址
写数据参数:见RLTsFileReadCostBench写数据参数
读数据参数:见RLTsFileReadCostBench读数据参数
输出:不同压缩方式下的一个TsFile文件、一个TsFile空间统计结果文件(
*writeResult.csv
)、REPEAT个读TsFile耗时结果csv文件(*readResult-T*csv
)、一个把重复读实验结果横向拼接起来的csv文件(*readResult-combined.csv
)、一个把写结果和读结果拼接起来的csv文件(*allResult-combined.csv
)、一个把读结果取平均值并且按照不同粒度统计百分比的csv文件(*allResult-combined-processed.csv
其它类似,不再赘述。
二、实验结果
改变压缩方式
人工数据实验结果
压缩方式 | GZIP | LZ4 | SNAPPY | UNCOMPRESSED |
dataset | synthetic | synthetic | synthetic | synthetic |
pagePointNum(ppn) | 10000 | 10000 | 10000 | 10000 |
numOfPagesInChunk(pic) | 1000 | 1000 | 1000 | 1000 |
chunksWritten(cw) | 10 | 10 | 10 | 10 |
timeEncoding(te) | TS_2DIFF | TS_2DIFF | TS_2DIFF | TS_2DIFF |
valueDataType(vt) | INT64 | INT64 | INT64 | INT64 |
valueEncoding(ve) | PLAIN | PLAIN | PLAIN | PLAIN |
compression(co) | GZIP | LZ4 | SNAPPY | UNCOMPRESSED |
totalPointNum | 100000000 | 100000000 | 100000000 | 100000000 |
tsfileSize(MB) | 767.1312866 | 770.8444319 | 767.9423904 | 781.4226151 |
chunkDataSize_stats_mean(MB) | 76.71300761 | 77.08436436 | 76.7941264 | 78.14216614 |
compressedPageSize_stats_mean(B) | 80375.41867 | 80764.81444 | 80460.47789 | 81874 |
uncompressedPageSize_stats_mean(B) | 81874 | 81874 | 81874 | 81874 |
timeBufferSize_stats_mean(B) | 1872 | 1872 | 1872 | 1872 |
valueBufferSize_stats_mean(B) | 80000 | 80000 | 80000 | 80000 |
----[1] each step---- | ||||
[Avg&Per] (A)1_index_read_deserialize_MagicString_FileMetadataSize(us) | 26642.8733 us - 0.24422388846191903% | 11918.6528 us - 0.16062988779113208% | 10188.2737 us - 0.1309325873262339% | 10953.8906 us - 0.14619657707769018% |
[Avg&Per] (A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us) | 5777.7715 us - 0.05296237408352104% | 5484.9663 us - 0.07392190510886776% | 5140.9507 us - 0.0660679126109081% | 6219.5857 us - 0.08300997092132265% |
[Avg&Per] (A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us) | 69234.1118 us - 0.6346396579532295% | 69331.4945 us - 0.9343933722044904% | 67589.2748 us - 0.8686102165735712% | 76722.6646 us - 1.0239823783523703% |
[Avg&Per] (B)4_data_read_deserialize_ChunkHeader(us) | 8684.7625 us - 0.07960952425196037% | 10008.712599999999 us - 0.13488927052826724% | 4487.9059 us - 0.05767543633654741% | 7069.0819 us - 0.09434780888370882% |
[Avg&Per] (B)5_data_read_ChunkData(us) | 5940909.1292 us - 54.457787348789346% | 4839621.4844 us - 65.22447369141621% | 5082130.2731 us - 65.31199351132103% | 4844317.4031 us - 64.65489281142766% |
[Avg&Per] (C)6_data_deserialize_PageHeader(us) | 6613.381399999991 us - 0.060622054656159316% | 7120.158900000014 us - 0.09595969816001626% | 7692.346500000014 us - 0.09885667184764595% | 7605.696300000007 us - 0.10150975630087579% |
[Avg&Per] (D-1)7_data_decompress_PageData(us) | 2859428.0031000106 us - 26.211160404158996% | 521804.2723000004 us - 7.032452670194604% | 605202.8143000009 us - 7.777644443672276% | 498170.42259999976 us - 6.648853201570805% |
[Avg&Per] (D-2)8_data_decode_PageData(us) | 1991910.319299996 us - 18.25899474764487% | 1954657.4198999994 us - 26.343279504596413% | 1998880.5966999987 us - 25.68821922031179% | 2041517.9070999995 us - 27.247207495465567% |
----[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData---- | ||||
[Avg&Per] (A)get_chunkMetadatas | 101654.7566 us - 0.9318259204986696% | 86735.1136 us - 1.1689451651044902% | 82918.49919999999 us - 1.0656107165107132% | 93896.1409 us - 1.2531889263513831% |
[Avg&Per] (B)load_on_disk_chunk | 5949593.8917000005 us - 54.53739687304131% | 4849630.197000001 us - 65.35936296194448% | 5086618.179 us - 65.36966894765757% | 4851386.484999999 us - 64.74924062031137% |
[Avg&Per] (C)get_pageHeader | 6613.381399999991 us - 0.060622054656159316% | 7120.158900000014 us - 0.09595969816001626% | 7692.346500000014 us - 0.09885667184764595% | 7605.696300000007 us - 0.10150975630087579% |
[Avg&Per] (D_1)decompress_pageData | 2859428.0031000106 us - 26.211160404158996% | 521804.2723000004 us - 7.032452670194604% | 605202.8143000009 us - 7.777644443672276% | 498170.42259999976 us - 6.648853201570805% |
[Avg&Per] (D_2)decode_pageData | 1991910.319299996 us - 18.25899474764487% | 1954657.4198999994 us - 26.343279504596413% | 1998880.5966999987 us - 25.68821922031179% | 2041517.9070999995 us - 27.247207495465567% |
----[3] D_1 compare each step inside---- | ||||
[Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us) | 65952.37819999998 us - 2.5136549346918415% | 108809.34350000018 us - 59.6896105506002% | 108132.35939999981 us - 43.622622294156905% | 110765.11740000003 us - 63.813731447511664% |
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us) | 2554687.926599999 us - 97.36728361519128% | 68904.38600000006 us - 37.79892271446546% | 135345.91170000008 us - 54.601079805416944% | 57547.215800000035 us - 33.15396273496119% |
[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us) | 811.8335000000624 us - 0.03094155721322126% | 1239.949800000022 us - 0.680200047933345% | 1184.7460000000272 us - 0.47794876167766753% | 1229.1439000000355 us - 0.7081314098345313% |
[Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us) | 2312.0582000000422 us - 0.08811989290364733% | 3338.2513999999974 us - 1.8312666870009688% | 3218.3657999999955 us - 1.29834913874849% | 4034.201500000018 us - 2.3241744076926314% |
----[3] D_2 compare each step inside---- | ||||
[Avg&Per] (D-2)8_1_createBatchData(us) | 5384.7852 us - 0.053292019060348375% | 5848.7599 us - 0.05759123169122766% | 5913.4963 us - 0.058362326692940975% | 6019.3023 us - 0.05943520403215091% |
[Avg&Per] (D-2)8_2_timeDecoder_hasNext(us) | 1859842.2956 us - 18.406444711361424% | 1862234.7849 us - 18.336946086748988% | 1864092.3926 us - 18.397368271414525% | 1857778.6739 us - 18.343895858133802% |
[Avg&Per] (D-2)8_3_timeDecoder_readLong(us) | 2074757.7936 us - 20.533415498567617% | 2084700.4377 us - 20.527508047369906% | 2063043.8916 us - 20.360888969091857% | 2069930.4964 us - 20.43870456313607% |
[Avg&Per] (D-2)8_4_valueDecoder_read(us) | 1876012.952 us - 18.56648209392724% | 1881471.5433999998 us - 18.526365490982297% | 1877809.2412 us - 18.532744562964893% | 1876843.1276 us - 18.53214021585961% |
[Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us) | 1780379.6374 us - 17.620020492363725% | 1780782.3133 us - 17.534904586680103% | 1781949.2049 us - 17.586668929952697% | 1780599.5789 us - 17.5818216126948% |
[Avg&Per] (D-2)8_6_putIntoBatchData(us) | 2507922.0072 us - 24.82034518471963% | 2540605.1784 us - 25.016684556527476% | 2539577.912 us - 25.063966939883077% | 2536332.2055 us - 25.044002546143567% |