Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

一、实验设置

实验目的

  1. 读一个TsFile的磁盘IO代价和CPU代价的大小比较
  2. 了解CPU代价里有没有突出的耗时瓶颈操作

IoTDB版本

  • v0.13.1

实验环境

  • FIT楼166.111.130.101 / 192.168.130.31
  • CPU:Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz,(6核12线程)
  • L1 cache 284KB, L2 cache 1536KB, L3 cache 12MB
  • 内存:16G
  • 硬盘:1.8T HDD /dev/sdb1 mounted on /disk
  • 操作系统:Ubuntu 16.04.7 LTS
  • 工作文件夹:/disk/rl/tsfileReadExp/

数据集

  • 中车数据:/disk/rl/zc_data
  • 人工数据:时间戳从1开始以步长1递增,值在[0,100)随机取整数

实验工具

...

实验设计

分析TsFile文件结构,然后把读取TsFile的耗时步骤进行分解和分类,最后进行实验测量统计。

TsFile结构

Image Added





















TsFile读耗时步骤分解与分类

Image Added















耗时分类耗时分解
(A)get_chunkMetadatas

(A)1_index_read_deserialize_MagicString_FileMetadataSize

(A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter

(A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet

(B)load_on_disk_chunk

(B)4_data_read_deserialize_ChunkHeader

(B)5_data_read_ChunkData

(C)get_pageHeader

(C)6_data_deserialize_PageHeader

(D-1)decompress_pageData(D-1)7_1_data_ByteBuffer_to_ByteArray
(D-1)7_2_data_decompress_PageDataByteArray
(D-1)7_3_data_ByteArray_to_ByteBuffer
(D-1)7_4_data_split_time_value_Buffer
(D-2)decode_pageData

(D-2)8_1_createBatchData

(D-2)8_2_timeDecoder_hasNext
(D-2)8_3_timeDecoder_readLong
(D-2)8_4_valueDecoder_read
(D-2)8_5_checkValueSatisfyOrNot

(D-2)8_6_putIntoBatchData


回答实验目的1:分析A、B、C、D-1、D-2类操作耗时占比,对比B类操作(磁盘IO代价所在)和D类操作(CPU代价主要所在)耗时

回答实验目的2:分析D-1操作内部各子步骤耗时占比、D-2操作内部各子步骤耗时占比

实验结论

  1. 本次实验使用的人工数据集不具有代表性:其时间戳从1开始以步长1递增,值在[0,100)随机取整数,导致时间戳的编码压缩效率过高、值列的编码压缩效率过低。甚至对于值列来说,和其它编码方式相比,采用PLAIN编码的空间大小是最小的。
  2. 在中车ZT11529数据集上的实验结果来看:
    • 真实数据集的压缩率高、磁盘数据量相对小,此时【从磁盘加载Chunk数据的耗时】小于【解压缩和解码Page数据的耗时】,即整体耗时瓶颈不是磁盘IO。
    • D-1步骤内部的耗时瓶颈就是子步骤7_2_data_decompress_PageDataByteArray。注意:人工数据实验里发现另一个子步骤7_1_data_ByteBuffer_to_ByteArray(us)的占比也高,分析是因为人工数据压缩率很低,子步骤7_2_data_decompress_PageDataByteArray(us)耗时相对少,从而7_1_data_ByteBuffer_to_ByteArray(us)耗时占比相对偏高。
    • D-2类操作内部没有某一个子步骤是突出的耗时瓶颈。
    • 相对其它压缩方法,GZIP的压缩率最高,但磁盘加载IO代价和解压缩代价之间有tradeoff,GZIP压缩下的整体读耗时并不是最小的。
  3. 后续
    1. 可以增大真实数据集的数据量之后再实验看看,目前使用的中车数据量级是一千万点。
    2. D-1解压缩和D-2解码的空间压缩关系和耗时关系还有待探索
    3. 写数据的耗时可以也测量一下
    4. 注意RLE编码对于浮点数是有损的

实验设置

IoTDB版本

  • v0.13.1

实验环境

  • FIT楼166.111.130.101 / 192.168.130.31
  • CPU:Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz,(6核12线程)
  • L1 cache 284KB, L2 cache 1536KB, L3 cache 12MB
  • 内存:16G
  • 硬盘:1.8T HDD /dev/sdb1 mounted on /disk
  • 操作系统:Ubuntu 16.04.7 LTS
  • 工作文件夹:/disk/rl/tsfileReadExp/

数据集

(1)人工数据:时间戳从1开始以步长1递增,值在[0,100)随机取整数。

(2)中车数据:/disk/rl/zc_data/ZT11529.csv,这条传感器时间序列数据共12,780,287个点,其部分数据作图和csv部分截图如下所示。

Image Added


Image Added

实验工具

RLTsFileReadCostBench用法

(1) 用人工数据写TsFile

Code Block
languagebash
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar WRITE_SYN [pagePointNum] [numOfPagesInChunk] [chunksWritten] [timeEncoding] [valueDataType] [valueEncoding] [compressionType]
  • WRITE_SYN:用来标识是“写人工数据/写真实数据/读数据”中的“写人工数据”

  • pagePointNum(ppn):一个page内的点数

  • numOfPagesInChunk(pic):一个chunk内的pages数

  • chunksWritten(cw):写的chunks总数

  • timeEncoding(te):时间戳列编码方式

  • valueDataType(vt):值列数据类型

  • valueEncoding(ve):值列编码方式

  • compressionType(co):压缩方式

(2) 用真实数据集写TsFile

Code Block
languagebash
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar WRITE_REAL [path_of_real_data_csv_to_write] [pagePointNum] [numOfPagesInChunk] [timeEncoding] [valueDataType] [valueEncoding] [compressionType]
  • WRITE_REAL:用来标识是“写人工数据/写真实数据/读数据”中的“写真实数据”

  • path_of_real_data_csv_to_write:用来写TsFile的真实数据集csv地址

  • pagePointNum(ppn):一个page内的点数

  • numOfPagesInChunk(pic):一个chunk内的pages数

  • timeEncoding(te):时间戳列编码方式

  • valueDataType(vt):值列数据类型

  • valueEncoding(ve):值列编码方式

  • compressionType(co):压缩方式

(3) 读实验

Code Block
languagebash
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar READ [path_of_tsfile_to_read] [decomposeMeasureTime] [D_decompose_each_step] (timeEncoding)
  • WRITE_REAL:用来标识是“写人工数据/写真实数据/读数据”中的“读数据”

  • path_of_tsfile_to_read:要读取的TsFile地址

  • decomposeMeasureTimeFALSE to measure the read process as a whole, in which case D_decompose_each_step is useless. TRUE to measure the decomposed read process, and the decomposition granularity is controlled by D_decompose_each_step.

  • D_decompose_each_step:When decomposeMeasureTime is TRUE, D_decompose_each_step=FALSE to measure the "(D_1)decompress_pageData" and "(D_2)decode_pageData" steps without further deomposition, D_decompose_each_step=TRUE to break down these two steps further and measure substeps inside.

  • timeEncoding(te):If timeEncoding is not specified, TS_2DIFF will be used by default. timeEncoding should be the same with that used to write the TsFile.


控制参数decomposeMeasureTime=FALSE

decomposeMeasureTime=TRUE & D_decompose_each_step=FALSE

用于分析A/B/C/D-1/D-2类操作耗时占比(&实验目标1)

decomposeMeasureTime=TRUE & D_decompose_each_step=TRUE

用于分析D-1操作内部各子步骤耗时占比、D-2操作内部各子步骤耗时占比(&实验目标2)

测量的最小单元步骤total_time(us)
  • (A)1_index_read_deserialize_MagicString_FileMetadataSize(us)
  • (A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us)
  • (A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us)
  • (B)4_data_read_deserialize_ChunkHeader(us)
  • (B)5_data_read_ChunkData(us)
  • (C)6_data_deserialize_PageHeader(us)
  • (D-1)7_data_decompress_PageData(us)
  • (D-2)8_data_decode_PageData(us)
  • (A)1_index_read_deserialize_MagicString_FileMetadataSize(us)
  • (A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us)
  • (A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us)
  • (B)4_data_read_deserialize_ChunkHeader(us)
  • (B)5_data_read_ChunkData(us)
  • (C)6_data_deserialize_PageHeader(us)
  • (D-1)7_1_data_ByteBuffer_to_ByteArray(us)
  • (D-1)7_2_data_decompress_PageDataByteArray(us)
  • (D-1)7_3_data_ByteArray_to_ByteBuffer(us)
  • (D-1)7_4_data_split_time_value_Buffer(us)
  • (D-2)8_1_createBatchData(us)
  • (D-2)8_2_timeDecoder_hasNext(us)
  • (D-2)8_3_timeDecoder_readLong(us)
  • (D-2)8_4_valueDecoder_read(us)
  • (D-2)8_5_checkValueSatisfyOrNot(us)
  • (D-2)8_6_putIntoBatchData(us)


自动化脚本

(1) RLUnitSynExp.sh:用人工数据写TsFile,清空系统缓存,然后进行一次读TsFile实验。

  • 输入:见RLTsFileReadCostBench的写数据参数和读数据参数
  • 输出:一个TsFile文件、一个TsFile空间统计结果文件( *writeResult.csv)、一个读TsFile耗时结果csv文件( *readResult-T*csv

(2) RLUnitRealExp.sh:用真实数据写TsFile,清空系统缓存,然后进行一次读TsFile实验。

  • 输入:见RLTsFileReadCostBench的写数据参数和读数据参数

  • 输出:一个TsFile文件、一个TsFile空间统计结果文件( *writeResult.csv)、一个读TsFile耗时结果csv文件( *readResult-T*csv

(3) RLReadExpScripts.sh:重复若干次读实验,把读实验结果进行汇总,把写文件的空间结果和读文件的耗时结果汇总到一起,最后对读实验结果进行平均值和百分比统计计算。

  • 输入:

    • WRITE_READ_JAR_PATH:RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar的地址

    • Calculator_JAR_PATH:把若干次重复读实验结果进行平均值和百分比计算的RLRepeatReadResultAvgPercCalculator-0.13.1-jar-with-dependencies.jar的地址

    • FILE_NAME:要读取的TsFile的地址

    • decomposeMeasureTime:见RLTsFileReadCostBench读数据参数

    • D_decompose_each_step:见RLTsFileReadCostBench读数据参数

    • te::见RLTsFileReadCostBench读数据参数

    • REPEAT:读实验重复次数

  • 输出:

    • REPEAT个读TsFile耗时结果csv文件 *readResult-T*csv

    • 一个把重复读实验结果横向拼接起来的csv文件 *readResult-combined.csv

    • 一个把写结果和读结果拼接起来的csv文件 *allResult-combined.csv

    • 一个把读结果取平均值并且按照不同粒度统计百分比的csv文件 *allResult-combined-processed.csv

(4) RLCompressionExpScripts.sh:在不同的压缩方式参数下(UNCOMPRESSED, SNAPPY, GZIP, LZ4),写TsFile,清空系统缓存,然后进行若干次重复读TsFile实验,把读实验结果进行汇总,把写文件的空间结果和读文件的耗时结果汇总到一起,最后对读实验结果进行平均值和百分比统计计算。

  • 输入:

    • 工具地址:

      • WRITE_READ_JAR_PATH:RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar的地址

      • Calculator_JAR_PATH:把若干次重复读实验结果进行平均值和百分比统计计算的RLRepeatReadResultAvgPercCalculator-0.13.1-jar-with-dependencies.jar的地址

      • TOOL_PATH:用于替换脚本中变量值的自动脚本工具RLtool.sh的地址

      • READ_SCRIPT_PATH:RLReadExpScripts.sh的地址

    • 写数据参数:见RLTsFileReadCostBench写数据参数

    • 读数据参数:见RLTsFileReadCostBench读数据参数

    • REPEAT:读实验重复次数
  • 输出:不同压缩方式下的一个TsFile文件、一个TsFile空间统计结果文件( *writeResult.csv)、REPEAT个读TsFile耗时结果csv文件( *readResult-T*csv)、一个把重复读实验结果横向拼接起来的csv文件(*readResult-combined.csv)、一个把写结果和读结果拼接起来的csv文件(*allResult-combined.csv)、一个把读结果取平均值并且按照不同粒度统计百分比的csv文件( *allResult-combined-processed.csv

其它类似,不再赘述。

实验具体结果

中车数据实验结果

改变压缩方式

RLCompressionRealExpScripts.sh
压缩方式GZIPLZ4SNAPPYUNCOMPRESSED
dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv
pagePointNum(ppn)10000100001000010000
numOfPagesInChunk(pic)100100100100
chunksWritten(cw)13131313
timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
valueDataType(vt)DOUBLEDOUBLEDOUBLEDOUBLE
valueEncoding(ve)GORILLAGORILLAGORILLAGORILLA
compression(co)GZIPLZ4SNAPPYUNCOMPRESSED
totalPointNum12780287127802871278028712780287
tsfileSize(MB)19.3486261423.7774105123.1564121236.30773735
chunkDataSize_stats_mean(MB)1.5151396591.860070071.8131843412.837062438
compressedPageSize_stats_mean(B)15824.1666719440.27518948.6929684.7575
uncompressedPageSize_stats_mean(B)29684.757529684.757529684.757529684.7575
timeBufferSize_stats_mean(B)11461.462511461.462511461.462511461.4625
valueBufferSize_stats_mean(B)18221.2683318221.2683318221.2683318221.26833
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
[Avg&Per] (A)get_chunkMetadatas87316.85010000001 us - 8.19883087518919%100289.6416 us - 12.59103802007131%89466.6045 us - 11.199530364576052%88760.777 us - 10.335699109318087%
[Avg&Per] (B)load_on_disk_chunk160699.6025 us - 15.089285299443361%176784.88239999997 us - 22.19476647997349%105802.9384 us - 13.244531050378352%191518.85280000002 us - 22.301305860612082%
[Avg&Per] (C)get_pageHeader2436.5129999999995 us - 0.22878239411203669%2198.3668000000007 us - 0.27599779517870476%2319.9517000000005 us - 0.29041416798711567%3134.228800000001 us - 0.3649635223062446%
[Avg&Per] (D_1)decompress_pageData356587.5454999999 us - 33.482666568996265%31158.160799999983 us - 3.911805656191469%115629.7179 us - 14.474658381255693%29640.983400000016 us - 3.4515277590088287%
[Avg&Per] (D_2)decode_pageData457950.96670000016 us - 43.000434862259155%486085.0215000002 us - 61.02639204858503%485623.25309999986 us - 60.790866035802786%545723.8052999998 us - 63.54650374875476%
SUM1064991.4778 us796516.0731000002 us798842.4655999999 us858778.6472999998 us
[3] D_1 compare each step inside
[Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)1269.7615999999996 us - 0.36377892330196115%1808.5776999999991 us - 7.207505165727991%1687.9916000000003 us - 1.412377327892232%3197.6313 us - 41.86667970732365%
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)345856.23130000004 us - 99.08569249502276%21247.39440000001 us - 84.67466169480036%116100.69619999993 us - 97.14384305311928%2432.4552000000012 us - 31.84820675254647%
[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)374.18470000000025 us - 0.10720162531460037%442.10720000000003 us - 1.761876157051776%424.26030000000026 us - 0.35498732863644405%421.1930000000002 us - 5.51469221169019%
[Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)1547.4220999999993 us - 0.443326956360674%1594.8989000000004 us - 6.355956982419887%1301.2614999999998 us - 1.0887922903520593%1586.372500000001 us - 20.770421328439692%
[3] D_2 compare each step inside
[Avg&Per] (D-2)8_1_createBatchData(us)3430.8672 us - 0.22855030377902266%4174.8958 us - 0.275155541260775%3438.1465 us - 0.22874736939899784%3730.6205 us - 0.24721094086707432%
[Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)234016.77980000002 us - 15.589238228946503%236135.9462 us - 15.56305048087338%235278.4135 us - 15.65358490817499%234469.6457 us - 15.537217392727715%
[Avg&Per] (D-2)8_3_timeDecoder_readLong(us)357893.9434 us - 23.841426880277478%360253.1425 us - 23.743262865502565%358341.1524 us - 23.84121675993312%363063.0738 us - 24.058508247673558%
[Avg&Per] (D-2)8_4_valueDecoder_read(us)353821.6809 us - 23.570149451806063%359477.8899 us - 23.692168165422427%356440.4841 us - 23.714761161335133%355773.1939 us - 23.5754416723178%
[Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)223758.3096 us - 14.905861011513531%224938.2721 us - 14.825043539994219%224282.1638 us - 14.921980483485841%226053.6587 us - 14.979528915812129%
[Avg&Per] (D-2)8_6_putIntoBatchData(us)328221.5562 us - 21.864774123677403%332305.597 us - 21.90131940694663%325251.7878 us - 21.639709317671908%325993.7043 us - 21.602092830601723%


补充实验

时间戳和值都使用PLAIN编码,然后改变压缩。

RLCompressionRealExpScripts.sh
压缩方式GZIPLZ4SNAPPYUNCOMPRESSED
dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv
pagePointNum(ppn)10000100001000010000
numOfPagesInChunk(pic)100100100100
chunksWritten(cw)13131313
timeEncoding(te)PLAINPLAINPLAINPLAIN
valueDataType(vt)DOUBLEDOUBLEDOUBLEDOUBLE
valueEncoding(ve)PLAINPLAINPLAINPLAIN
compression(co)GZIPLZ4SNAPPYUNCOMPRESSED
totalPointNum12780287127802871278028712780287
tsfileSize(MB)50.757878374.7114610775.81546879195.0946026
chunkDataSize_stats_mean(MB)3.9771355795.8460962775.93273003915.26517868
compressedPageSize_stats_mean(B)41639.2891761236.762562145.18333160003
uncompressedPageSize_stats_mean(B)160003160003160003160003
timeBufferSize_stats_mean(B)80000800008000080000
valueBufferSize_stats_mean(B)80000800008000080000
total_time(us)2096685.59811244193.04041195582.11731895095.384
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
[Avg&Per] (A)get_chunkMetadatas86791.3859 us - 4.1844839300349514%100869.9527 us - 8.731470875680406%99547.05960000001 us - 7.911992556094877%88166.6213 us - 4.659966496653105%
[Avg&Per] (B)load_on_disk_chunk349328.55549999996 us - 16.84222128306944%452828.32 us - 39.197572537012675%450015.7795 us - 35.76721916082827%1155773.8993 us - 61.08737716190697%
[Avg&Per] (C)get_pageHeader2818.293900000001 us - 0.13587875585087916%3913.898699999998 us - 0.3387935811871694%3502.114700000002 us - 0.27834780402685505%4450.306 us - 0.23521687180559223%
[Avg&Per] (D_1)decompress_pageData1350175.8349000001 us - 65.09619618668371%261417.90119999985 us - 22.628768326063632%395144.6617 us - 31.40606698493612%173785.97769999996 us - 9.185299626198828%
[Avg&Per] (D_2)decode_pageData285009.94010000007 us - 13.74121984436101%336215.7518000001 us - 29.103394680056127%309969.7735 us - 24.636373494113883%469824.3798999997 us - 24.832139843435506%
SUM2074124.01031155245.82439999981258179.3891892001.1841999996
[3] D_1 compare each step inside
[Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)4312.080999999998 us - 0.3365746911741179%10247.3701 us - 5.205625954425964%9355.254500000003 us - 2.648386471251286%33088.566800000015 us - 56.71920599957147%
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)1274619.9271000002 us - 99.48904214184739%183796.93149999995 us - 93.36815862249874%341381.81159999996 us - 96.64204980983628%21318.842899999996 us - 36.54397754446109%
[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)454.68920000000026 us - 0.03549026028736633%659.1047999999998 us - 0.334822790636471%604.4583000000001 us - 0.17111658310904862%973.5250000000001 us - 1.6687808013713306%
[Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)1779.4489000000008 us - 0.1388929066911369%2148.4263999999994 us - 1.091392632438828%1902.0298000000007 us - 0.5384471358033915%2956.5653000000016 us - 5.068035654596102%
[3] D_2 compare each step inside
[Avg&Per]

RLTsFileReadCostBench用法

(1) 用人工数据写TsFile

Code Block
languagebash
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar WRITE_SYN [pagePointNum] [numOfPagesInChunk] [chunksWritten] [timeEncoding] [valueDataType] [valueEncoding] [compressionType]
  • WRITE_SYN:用来标识是“写人工数据/写真实数据/读数据”中的“写人工数据”

  • pagePointNum(ppn):一个page内的点数

  • numOfPagesInChunk(pic):一个chunk内的pages数

  • chunksWritten(cw):写的chunks总数

  • timeEncoding(te):时间戳列编码方式

  • valueDataType(vt):值列数据类型

  • valueEncoding(ve):值列编码方式

  • compressionType(co):压缩方式

(2) 用真实数据集写TsFile

Code Block
languagebash
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar WRITE_REAL [path_of_real_data_csv_to_write] [pagePointNum] [numOfPagesInChunk] [timeEncoding] [valueDataType] [valueEncoding] [compressionType]
  • WRITE_REAL:用来标识是“写人工数据/写真实数据/读数据”中的“写真实数据”

  • path_of_real_data_csv_to_write:用来写TsFile的真实数据集csv地址

  • pagePointNum(ppn):一个page内的点数

  • numOfPagesInChunk(pic):一个chunk内的pages数

  • timeEncoding(te):时间戳列编码方式

  • valueDataType(vt):值列数据类型

  • valueEncoding(ve):值列编码方式

  • compressionType(co):压缩方式

(3) 读实验

Code Block
languagebash
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar READ [path_of_tsfile_to_read] [decomposeMeasureTime] [D_decompose_each_step] (timeEncoding)
  • WRITE_REAL:用来标识是“写人工数据/写真实数据/读数据”中的“读数据”

  • path_of_tsfile_to_read:要读取的TsFile地址

  • decomposeMeasureTimeFALSE to measure the read process as a whole, in which case D_decompose_each_step is useless. TRUE to measure the decomposed read process, and the decomposition granularity is controlled by D_decompose_each_step.

  • D_decompose_each_step:When decomposeMeasureTime is TRUE, D_decompose_each_step=FALSE to measure the "(D_1)decompress_pageData" and "(D_2)decode_pageData" steps without further deomposition, D_decompose_each_step=TRUE to break down these two steps further and measure substeps inside.

  • timeEncoding(te):If timeEncoding is not specified, TS_2DIFF will be used by default. timeEncoding should be the same with that used to write the TsFile.

控制参数decomposeMeasureTime=FALSEdecomposeMeasureTime=TRUE & D_decompose_each_step=FALSEdecomposeMeasureTime=TRUE & D_decompose_each_step=TRUE
测量最小单元步骤total_time(us)
  • (A)1_index_read_deserialize_MagicString_FileMetadataSize(us)
  • (A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us)
  • (A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us)
  • (B)4_data_read_deserialize_ChunkHeader(us)
  • (B)5_data_read_ChunkData(us)
  • (C)6_data_deserialize_PageHeader(us)
  • (D-1)7_data_decompress_PageData(us)
  • (D-2)8_data_decode_PageData(us)
  • (A)1_index_read_deserialize_MagicString_FileMetadataSize(us)(A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us)(A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us)
  • (B)4_data_read_deserialize_ChunkHeader(us)
  • (B)5_data_read_ChunkData(us)(C)6_data_deserialize_PageHeader(us)(D-1)7_1_data_ByteBuffer_to_ByteArray(us)(D-1)7_2_data_decompress_PageDataByteArray(us)(D-1)7_3_data_ByteArray_to_ByteBuffer(us)(D-1)7_4_data_split_time_value_Buffer(us) (D-2)8_1_createBatchData(us)
    3343.3923 us - 0.259976288059772%3522.578 us - 0.27212097540020236%3458.9225 us - 0.2672966393395521%3730.5984 us - 0.2816327187407449%
    [Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)232202.511 us - 18.05565768873081%231677.2947 us - 17.897191037883086%231911.8133 us - 17.921548782382853%237390.8697 us - 17.921263258420133%
    [Avg&Per] (D-2)8_3_timeDecoder_readLong(us)254086.67 us - 19.75733129255225%255389.3804 us - 19.72896194244707%255976.5501 us - 19.78120978179259%261059.1494 us - 19.70804415658043%
    [Avg&Per] (D-2)8_4_valueDecoder_read(us)241634.0221 us - 18.78903535624908%242640.7293 us - 18.744122040429612%242328.9765 us - 18.726560376256852%246610.5964 us - 18.6172847590372%
    [Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)checkValueSatisfyOrNot(us)230100.908 us - 17.892240746329144%231053.5736 us - 17.849008259784295%231043.8535 us - 17.85447508020484%235200.0462 us - 17.755872210542634%
    [Avg&Per] (D-2)8_6_putIntoBatchData(us)

    Image Removed

    Image Removed

    结合实验目的,

    • 目的1:decomposeMeasureTime=TRUE & D_decompose_each_step=FALSE:对比B类操作(磁盘IO代价所在)和D类操作(CPU代价主要所在)耗时
    • 目的2:decomposeMeasureTime=TRUE & D_decompose_each_step=TRUE:分析D-1操作内部各子步骤耗时占比,分析D-2操作内部各子步骤耗时占比

    自动化脚本

    (1) RLUnitSynExp.sh:用人工数据写TsFile,清空系统缓存,然后进行一次读TsFile实验。

    • 输入:见RLTsFileReadCostBench的写数据参数和读数据参数
    • 输出:一个TsFile文件、一个TsFile空间统计结果文件( *writeResult.csv)、一个读TsFile耗时结果csv文件( *readResult-T*csv

    (2) RLUnitRealExp.sh:用真实数据写TsFile,清空系统缓存,然后进行一次读TsFile实验。

    • 输入:见RLTsFileReadCostBench的写数据参数和读数据参数

    • 输出:一个TsFile文件、一个TsFile空间统计结果文件( *writeResult.csv)、一个读TsFile耗时结果csv文件( *readResult-T*csv

    (3) RLReadExpScripts.sh:重复若干次读实验,把读实验结果进行汇总,把写文件的空间结果和读文件的耗时结果汇总到一起,最后对读实验结果进行平均值和百分比统计计算。

    • 输入:

      • WRITE_READ_JAR_PATH:RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar的地址

      • Calculator_JAR_PATH:把若干次重复读实验结果进行平均值和百分比计算的RLRepeatReadResultAvgPercCalculator-0.13.1-jar-with-dependencies.jar的地址

      • FILE_NAME:要读取的TsFile的地址

      • decomposeMeasureTime:见RLTsFileReadCostBench读数据参数

      • D_decompose_each_step:见RLTsFileReadCostBench读数据参数

      • te::见RLTsFileReadCostBench读数据参数

      • REPEAT:读实验重复次数

    • 输出:

      • REPEAT个读TsFile耗时结果csv文件 *readResult-T*csv

      • 一个把重复读实验结果横向拼接起来的csv文件 *readResult-combined.csv

      • 一个把写结果和读结果拼接起来的csv文件 *allResult-combined.csv

      • 一个把读结果取平均值并且按照不同粒度统计百分比的csv文件 *allResult-combined-processed.csv

    (4) RLCompressionExpScripts.sh:在不同的压缩方式参数下(UNCOMPRESSED, SNAPPY, GZIP, LZ4),写TsFile,清空系统缓存,然后进行若干次重复读TsFile实验,把读实验结果进行汇总,把写文件的空间结果和读文件的耗时结果汇总到一起,最后对读实验结果进行平均值和百分比统计计算。

    • 输入:

      • 工具地址:

        • WRITE_READ_JAR_PATH:RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar的地址

        • Calculator_JAR_PATH:把若干次重复读实验结果进行平均值和百分比统计计算的RLRepeatReadResultAvgPercCalculator-0.13.1-jar-with-dependencies.jar的地址

        • TOOL_PATH:用于替换脚本中变量值的自动脚本工具RLtool.sh的地址

        • READ_SCRIPT_PATH:RLReadExpScripts.sh的地址

      • 写数据参数:见RLTsFileReadCostBench写数据参数

      • 读数据参数:见RLTsFileReadCostBench读数据参数

      • REPEAT:读实验重复次数
    • 输出:不同压缩方式下的一个TsFile文件、一个TsFile空间统计结果文件( *writeResult.csv)、REPEAT个读TsFile耗时结果csv文件( *readResult-T*csv)、一个把重复读实验结果横向拼接起来的csv文件(*readResult-combined.csv)、一个把写结果和读结果拼接起来的csv文件(*allResult-combined.csv)、一个把读结果取平均值并且按照不同粒度统计百分比的csv文件( *allResult-combined-processed.csv

    其它类似,不再赘述。

    二、实验结果

    改变压缩方式

    人工数据实验结果

    RLCompressionSynExpScripts.sh
    324669.8983 us - 25.24575862807894%330206.1447 us - 25.508595744055732%329318.78729999997 us - 25.448909340023306%340641.1962 us - 25.715902896678855%


    • 可以看到,当时间戳列和值列都使用PLAIN编码之后,压缩负责了全部的压缩率,此时D-1操作和耗时占比有了明显提高;但是也可以看到,即便如此,除了GZIP之外的压缩方式的D-1耗时占比也没有增大到60%以上,D-2解码操作仍然有不小的基础耗时

    改变值列编码方式

    RLValueEncodingRealExpScripts.sh
    编码方式GORILLAPLAINRLETS_2DIFF
    dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv
    pagePointNum(ppn)10000100001000010000
    numOfPagesInChunk(pic)100100100100
    chunksWritten(cw)13131313
    timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
    valueDataType(vt)DOUBLEDOUBLEDOUBLEDOUBLE
    valueEncoding(ve)GORILLAPLAINRLETS_2DIFF
    compression(co)SNAPPYSNAPPYSNAPPYSNAPPY
    totalPointNum12780287127802871278028712780287
    tsfileSize(MB)23.1564121226.5996017522.1014785820.98550797
    chunkDataSize_stats_mean(MB)1.8131843412.0811667441.7400725681.642661015
    compressedPageSize_stats_mean(B)18948.6921758.6166718182.2991717160.8925
    uncompressedPageSize_stats_mean(B)29684.757591463.4891725245.9341722210.36417
    timeBufferSize_stats_mean(B)11461.462511461.462511461.462511461.4625
    valueBufferSize_stats_mean(B)18221.268338000013782.44510746.875
    [2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
    [Avg&Per] (A)get_chunkMetadatas88753.03020000001 us - 11.21274000219672%97256.88870000001 us - 12.181751942467722%84509.5108 us - 9.498496964807579%86370.7961 us - 11.6687579647861%
    [Avg&Per] (B)load_on_disk_chunk105215.9241 us - 13.29260306229143%138228.6108 us - 17.313597737139155%139816.5536 us - 15.714765088895252%96826.5282 us - 13.081335047881257%
    [Avg&Per] (C)get_pageHeader2352.1583 us - 0.2971632563133493%2625.5312999999983 us - 0.328856613051254%3388.6254999999987 us - 0.38086658793698225%3159.255499999999 us - 0.4268177375109232%
    [Avg&Per] (D_1)decompress_pageData114457.26139999992 us - 14.460120522641777%182862.35910000015 us - 22.904124612107367%155021.76800000004 us - 17.42376424722015%120684.50350000002 us - 16.304565026949902%
    [Avg&Per] (D_2)decode_pageData480759.0216999998 us - 60.73737315655672%377408.39589999994 us - 47.27166909523449%506978.10549999995 us - 56.982107111140046%433147.3443000001 us - 58.51852422287182%
    SUM791537.3956999998 us798381.7858000002 us889714.5634 us740188.4276000002
    [3] D_1 compare each step inside
    [Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)1754.3673999999999 us - 1.6860113426598728%2320.3632000000002 us - 1.489795195953362%1825.1960999999988 us - 1.548613686329812%1607.3271999999997 us - 1.758246074195737%
    [Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)100527.56879999992 us - 96.61067644486589%151532.73299999998 us - 97.29198327791244%113686.73430000001 us - 96.4591326329927%87212.82 us - 95.4016073295714%
    [Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)431.9958999999999 us - 0.41516388607230165%468.2119000000003 us - 0.30061666178303303%474.28859999999975 us - 0.4024169332984032%663.3730000000002 us - 0.7256599483773118%
    [Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)1340.373799999999 us - 1.288148326401935%1429.1740000000002 us - 0.9176048643511714%1873.7816000000007 us - 1.5898367473790769%1932.989900000001 us - 2.1144866478555286%
    [3] D_2 compare each step inside
    [Avg&Per] (D-2)8_1_createBatchData(us)3348.9259 us - 0.2218271110091239%3442.3069 us - 0.24318767981262365%3466.6887 us - 0.22277034162258696%3338.6453 us - 0.22194982158403212%
    [Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)235317.741 us - 15.587043790733997%244839.8323 us - 17.29713023052909%239308.8755 us - 15.37805224577913%240223.005 us - 15.969786637750962%
    [Avg&Per] (D-2)8_3_timeDecoder_readLong(us)361562.5392 us - 23.949282819264262%363364.4154 us - 25.67050285597614%362143.5815 us - 23.271443255687238%357833.1113 us - 23.788389623148674%
    [Avg&Per] (D-2)8_4_valueDecoder_read(us)356325.4946 us - 23.602389962111484%241625.5526 us - 17.07005192367858%396438.5566 us - 25.475247513902037%335421.7093 us - 22.298501890735768%
    压缩方式GZIPLZ4SNAPPYUNCOMPRESSED
    datasetsyntheticsyntheticsyntheticsynthetic
    pagePointNum(ppn)10000100001000010000
    numOfPagesInChunk(pic)1000100010001000
    chunksWritten(cw)10101010
    timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
    valueDataType(vt)INT64INT64INT64INT64
    valueEncoding(ve)PLAINPLAINPLAINPLAIN
    compression(co)GZIPLZ4SNAPPYUNCOMPRESSED
    totalPointNum100000000100000000100000000100000000
    tsfileSize(MB)767.1312866770.8444319767.9423904781.4226151
    chunkDataSize_stats_mean(MB)76.7130076177.0843643676.794126478.14216614
    compressedPageSize_stats_mean(B)80375.4186780764.8144480460.4778981874
    uncompressedPageSize_stats_mean(B)81874818748187481874
    timeBufferSize_stats_mean(B)1872187218721872
    valueBufferSize_stats_mean(B)80000800008000080000
    [1] each step
    [Avg&Per] (A)1_index_read_deserialize_MagicString_FileMetadataSize(us)26642.8733 us - 0.24422388846191903%11918.6528 us - 0.16062988779113208%10188.2737 us - 0.1309325873262339%10953.8906 us - 0.14619657707769018%
    [Avg&Per] (A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us)5777.7715 us - 0.05296237408352104%5484.9663 us - 0.07392190510886776%5140.9507 us - 0.0660679126109081%6219.5857 us - 0.08300997092132265%
    [Avg&Per] (A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us)69234.1118 us - 0.6346396579532295%69331.4945 us - 0.9343933722044904%67589.2748 us - 0.8686102165735712%76722.6646 us - 1.0239823783523703%
    [Avg&Per] (B)4_data_read_deserialize_ChunkHeader(us)8684.7625 us - 0.07960952425196037%10008.712599999999 us - 0.13488927052826724%4487.9059 us - 0.05767543633654741%7069.0819 us - 0.09434780888370882%
    [Avg&Per] (B)5_data_read_ChunkData(us)5940909.1292 us - 54.457787348789346%4839621.4844 us - 65.22447369141621%5082130.2731 us - 65.31199351132103%4844317.4031 us - 64.65489281142766%
    [Avg&Per] (C)6_data_deserialize_PageHeader(us)6613.381399999991 us - 0.060622054656159316%7120.158900000014 us - 0.09595969816001626%7692.346500000014 us - 0.09885667184764595%7605.696300000007 us - 0.10150975630087579%
    [Avg&Per] (D-12)78_data5_decompress_PageDatacheckValueSatisfyOrNot(us)2859428225554.0031000106 2633 us - 2614.211160404158996%940327764084076%521804232364.2723000004 86419999998 us - 716.032452670194604%415814695306036%605202224510.8143000009 63450000001 us - 714.777644443672276%4271133273284%498170230332.42259999976 4743 us - 615.648853201570805%312273986066641%
    [Avg&Per] (D-2)8_data6_decode_PageDataputIntoBatchData(us)1991910327591.319299996 9399 us - 1821.25899474764487%699128552797042%1954657329856.4198999994 9807 us - 2623.343279504596413%30331261469754%1998880330303.5966999987 223 us - 2521.68821922031179%225373315680617%2041517337085.9070999995 3345 us - 27.247207495465567%22.409098040713936%


    人工数据实验结果

    改变压缩方式

    RLCompressionSynExpScripts.sh
    2069930.4964 us - 20.43870456313607%
    压缩方式GZIPLZ4SNAPPYUNCOMPRESSED
    datasetsyntheticsyntheticsyntheticsynthetic
    pagePointNum(ppn)10000100001000010000
    numOfPagesInChunk(pic)1000100010001000
    chunksWritten(cw)10101010
    timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
    valueDataType(vt)INT64INT64INT64INT64
    valueEncoding(ve)PLAINPLAINPLAINPLAIN
    compression(co)GZIPLZ4SNAPPYUNCOMPRESSED
    totalPointNum100000000100000000100000000100000000
    tsfileSize(MB)767.1312866770.8444319767.9423904781.4226151
    chunkDataSize_stats_mean(MB)76.7130076177.0843643676.794126478.14216614
    compressedPageSize_stats_mean(B)80375.4186780764.8144480460.4778981874
    uncompressedPageSize_stats_mean(B)81874818748187481874
    timeBufferSize_stats_mean(B)1872187218721872
    valueBufferSize_stats_mean(B)80000800008000080000
    [2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
    [Avg&Per] (A)get_chunkMetadatas101654.7566 us - 0.9318259204986696%86735.1136 us - 1.1689451651044902%82918.49919999999 us - 1.0656107165107132%93896.1409 us - 1.2531889263513831%
    [Avg&Per] (B)load_on_disk_chunk5949593.8917000005 us - 54.53739687304131%4849630.197000001 us - 65.35936296194448%5086618.179 us - 65.36966894765757%4851386.484999999 us - 64.74924062031137%
    [Avg&Per] (C)get_pageHeader6613.381399999991 us - 0.060622054656159316%7120.158900000014 us - 0.09595969816001626%7692.346500000014 us - 0.09885667184764595%7605.696300000007 us - 0.10150975630087579%
    [Avg&Per] (D_1)decompress_pageData2859428.0031000106 us - 26.211160404158996%521804.2723000004 us - 7.032452670194604%605202.8143000009 us - 7.777644443672276%498170.42259999976 us - 6.648853201570805%
    [2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
    [Avg&Per] (A)get_chunkMetadatas101654.7566 us - 0.9318259204986696%86735.1136 us - 1.1689451651044902%82918.49919999999 us - 1.0656107165107132%93896.1409 us - 1.2531889263513831%
    [Avg&Per] (B)load_on_disk_chunk5949593.8917000005 us - 54.53739687304131%4849630.197000001 us - 65.35936296194448%5086618.179 us - 65.36966894765757%4851386.484999999 us - 64.74924062031137%
    [Avg&Per] (C)get_pageHeader6613.381399999991 us - 0.060622054656159316%7120.158900000014 us - 0.09595969816001626%7692.346500000014 us - 0.09885667184764595%7605.696300000007 us - 0.10150975630087579%
    [Avg&Per] (D_1)decompress_pageData2859428.0031000106 us - 26.211160404158996%521804.2723000004 us - 7.032452670194604%605202.8143000009 us - 7.777644443672276%498170.42259999976 us - 6.648853201570805%
    [Avg&Per] (D_2)decode_pageData1991910.319299996 us - 18.25899474764487%1954657.4198999994 us - 26.343279504596413%1998880.5966999987 us - 25.68821922031179%2041517.9070999995 us - 27.247207495465567%
    [3] D_1 compare each step inside
    [Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)65952.37819999998 us - 2.5136549346918415%108809.34350000018 us - 59.6896105506002%108132.35939999981 us - 43.622622294156905%110765.11740000003 us - 63.813731447511664%
    [Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)2554687.926599999 us - 97.36728361519128%68904.38600000006 us - 37.79892271446546%135345.91170000008 us - 54.601079805416944%57547.215800000035 us - 33.15396273496119%
    [Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)811.8335000000624 us - 0.03094155721322126%1239.949800000022 us - 0.680200047933345%1184.7460000000272 us - 0.47794876167766753%1229.1439000000355 us - 0.7081314098345313%
    [Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)2312.0582000000422 us - 0.08811989290364733%3338.2513999999974 us - 1.8312666870009688%3218.3657999999955 us - 1.29834913874849%4034.201500000018 us - 2.3241744076926314%
    [3] D_2 compare each step inside
    [Avg&Per] (D-2)8_1_createBatchData(us)5384.7852 us - 0.053292019060348375%5848.7599 us - 0.05759123169122766%5913.4963 us - 0.058362326692940975%6019.3023 us - 0.05943520403215091%
    [Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)1859842.2956 us - 18.406444711361424%1862234.7849 us - 18.336946086748988%1864092.3926 us - 18.397368271414525%1857778.6739 us - 18.343895858133802%
    [Avg&Per] (D-_2)8_3_timeDecoder_readLong(us)2074757.7936 us - 20.533415498567617%2084700.4377 us - 20.527508047369906%2063043.8916 us - 20.360888969091857%decode_pageData1991910.319299996 us - 18.25899474764487%1954657.4198999994 us - 26.343279504596413%1998880.5966999987 us - 25.68821922031179%2041517.9070999995 us - 27.247207495465567%
    [3] D_1 compare each step inside
    [Avg&Per] (D-2)8_4_valueDecoder_read1)7_1_data_ByteBuffer_to_ByteArray(us)187601265952.952 37819999998 us - 182.56648209392724%5136549346918415%1881471108809.5433999998 34350000018 us - 1859.526365490982297%6896105506002%1877809108132.2412 35939999981 us - 1843.532744562964893%622622294156905%1876843110765.1276 11740000003 us - 1863.53214021585961%813731447511664%
    [Avg&Per] (D-2)8_5_checkValueSatisfyOrNot1)7_2_data_decompress_PageDataByteArray(us)17803792554687.6374 926599999 us - 1797.620020492363725%36728361519128%178078268904.3133 38600000006 us - 1737.534904586680103%79892271446546%1781949135345.2049 91170000008 us - 1754.586668929952697%601079805416944%178059957547.5789 215800000035 us - 1733.5818216126948%15396273496119%
    [Avg&Per] (D-2)8_6_putIntoBatchData1)7_3_data_ByteArray_to_ByteBuffer(us)2507922811.0072 8335000000624 us - 240.82034518471963%03094155721322126%25406051239.1784 949800000022 us - 250.016684556527476%680200047933345%25395771184.912 7460000000272 us - 250.063966939883077%47794876167766753%25363321229.2055 1439000000355 us - 250.044002546143567%

    B类操作耗时超过D类操作耗时。分析原因:使用的人工数据数值是INT64类型的随机取整的数,且PLAIN编码,且四种压缩方式此时的压缩率都不高,所以磁盘数据量偏大。

    中车数据实验结果

    RLCompressionRealExpScripts.sh

    ZT11529传感器数据如下图所示,共12,780,287个点。

    Image Removed

    7081314098345313%
    [Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)2312.0582000000422 us - 0.08811989290364733%3338.2513999999974 us - 1.8312666870009688%3218.3657999999955 us - 1.29834913874849%4034.201500000018 us - 2.3241744076926314%
    [3] D_2 compare each step inside
    [Avg&Per] (D-2)8_1_createBatchData(us)5384.7852 us - 0.053292019060348375%5848.7599 us - 0.05759123169122766%5913.4963 us - 0.058362326692940975%6019.3023 us - 0.05943520403215091%
    [Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)1859842.2956 us - 18.406444711361424%1862234.7849 us - 18.336946086748988%1864092.3926 us - 18.397368271414525%1857778.6739 us - 18.343895858133802%
    [Avg&Per] (D-2)8_3_timeDecoder_readLong(us)2074757.7936 us - 20.533415498567617%2084700.4377 us - 20.527508047369906%2063043.8916 us - 20.360888969091857%2069930.4964 us - 20.43870456313607%
    [Avg&Per] (D-2)8_4_valueDecoder_read(us)1876012.952 us - 18.56648209392724%1881471.5433999998 us - 18.526365490982297%1877809.2412 us - 18.532744562964893%1876843.1276 us - 18.53214021585961%
    [Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)1780379.6374 us - 17.620020492363725%1780782.3133 us - 17.534904586680103%1781949.2049 us - 17.586668929952697%1780599.5789 us - 17.5818216126948%
    [Avg&Per] (D-2)8_6_putIntoBatchData(us)2507922.0072 us - 24.82034518471963%2540605.1784 us - 25.016684556527476%2539577.912 us - 25.063966939883077%2536332.2055 us - 25.044002546143567%


    • B类操作耗时超过D类操作耗时,分析原因:使用的人工数据数值是INT64类型的随机取整的数,且PLAIN编码,且四种压缩方式此时的压缩率都不高,所以磁盘数据量偏大。
    • D-1类操作内部7_1_data_ByteBuffer_to_ByteArray占比偏大,分析原因:人工数据压缩率低,解压缩7_2_data_decompress_PageDataByteArray耗时相对少,从而7_1_data_ByteBuffer_to_ByteArray相对占比变大。

    改变值列编码方式

    RLValueEncodingSynExpScripts.sh
    [3] D_1 compare each step inside3197.6313 us - 41.86667970732365%3] D_2 compare each step inside
    编码方式GORILLAPLAINRLETS_2DIFF
    datasetsyntheticsyntheticsyntheticsynthetic
    pagePointNum(ppn)10000100001000010000
    numOfPagesInChunk(pic)1000100010001000
    chunksWritten(cw)10101010
    timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
    valueDataType(vt)INT64INT64INT64INT64
    valueEncoding(ve)GORILLAPLAINRLETS_2DIFF
    compression(co)UNCOMPRESSEDUNCOMPRESSEDUNCOMPRESSEDUNCOMPRESSED
    totalPointNum100000000100000000100000000100000000
    tsfileSize(MB)805.3812895781.4226151781.8422318793.3244705
    chunkDataSize_stats_mean(MB)80.5380362478.1421661478.1841278179.33235168
    compressedPageSize_stats_mean(B)84386.25189818748191883122
    uncompressedPageSize_stats_mean(B)84386.25189818748191883122
    timeBufferSize_stats_mean(B)1872187218721872
    valueBufferSize_stats_mean(B)82512.25189800008004481248
    压缩方式GZIPLZ4SNAPPYUNCOMPRESSED
    dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv
    pagePointNum(ppn)10000100001000010000
    numOfPagesInChunk(pic)100100100100
    chunksWritten(cw)13131313
    timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
    valueDataType(vt)DOUBLEDOUBLEDOUBLEDOUBLE
    valueEncoding(ve)GORILLAGORILLAGORILLAGORILLA
    compression(co)GZIPLZ4SNAPPYUNCOMPRESSED
    totalPointNum12780287127802871278028712780287
    tsfileSize(MB)19.3486261423.7774105123.1564121236.30773735
    chunkDataSize_stats_mean(MB)1.5151396591.860070071.8131843412.837062438
    compressedPageSize_stats_mean(B)15824.1666719440.27518948.6929684.7575
    uncompressedPageSize_stats_mean(B)29684.757529684.757529684.757529684.7575
    timeBufferSize_stats_mean(B)11461.462511461.462511461.462511461.4625
    valueBufferSize_stats_mean(B)18221.2683318221.2683318221.2683318221.26833
    [1] each step
    [Avg&Per] (A)1_index_read_deserialize_MagicString_FileMetadataSize(us)10294.8866 us - 0.9666637540862395%17480.267 us - 2.1945906166045948%10013.4112 us - 1.2534900973847278%12118.2366 us - 1.4111012934589997%
    [Avg&Per] (A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us)5024.5339 us - 0.4717909959598364%8883.9004 us - 1.115344774578661%4736.353 us - 0.5929020055841159%5782.8723 us - 0.6733833355290506%
    [Avg&Per] (A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us)71997.4296 us - 6.760376125143112%73925.4742 us - 9.281102628888053%74716.8403 us - 9.353138261607208%70859.6681 us - 8.251214480330036%
    [Avg&Per] (B)4_data_read_deserialize_ChunkHeader(us)8457.0026 us - 0.7940911055429293%4862.6532 us - 0.6104902793831642%4758.252799999999 us - 0.5956434472253724%2203.0548 us - 0.25653348589027036%
    [Avg&Per] (B)5_data_read_ChunkData(us)152242.5999 us - 14.295194193900432%171922.22919999997 us - 21.584276200590327%101044.6856 us - 12.648887603152982%189315.798 us - 22.04477237472181%
    [Avg&Per] (C)6_data_deserialize_PageHeader(us)2436.5129999999995 us - 0.22878239411203669%2198.3668000000007 us - 0.27599779517870476%2319.9517000000005 us - 0.29041416798711567%3134.228800000001 us - 0.3649635223062446%
    [Avg&Per] (D-1)7_data_decompress_PageData(us)356587.5454999999 us - 33.482666568996265%31158.160799999983 us - 3.911805656191469%115629.7179 us - 14.474658381255693%29640.983400000016 us - 3.4515277590088287%
    [Avg&Per] (D-2)8_data_decode_PageData(us)457950.96670000016 us - 43.000434862259155%486085.0215000002 us - 61.02639204858503%485623.25309999986 us - 60.790866035802786%545723.8052999998 us - 63.54650374875476%
    [2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
    [Avg&Per] (A)get_chunkMetadatas87316.85010000001 us - 8.19883087518919%100289.6416 us - 12.59103802007131%89466.6045 us - 11.199530364576052%88760.777 us - 10.335699109318087%_chunkMetadatas91331.98490000001 us - 0.8580518676474486%100944.7581 us - 1.2939556377951902%88098.20449999999 us - 0.9671805828234409%88231.2157 us - 0.9257222461823116%[Avg&Per] (B)load_on_disk_chunk160699.6025 us - 15.089285299443361%176784.88239999997 us - 22.19476647997349%105802.9384 us - 13.244531050378352%191518.85280000002 us - 22.301305860612082%
    [Avg&Per] (C)get_pageHeader2436.5129999999995 us - 0.22878239411203669%2198.3668000000007 us - 0.27599779517870476%2319.9517000000005 us - 0.29041416798711567%B)load_on_disk_chunk5552645.935400001 us - 52.16637107440095%5170158.3812 us - 66.27343223726832%5270914.364100001 us - 57.866400973957255%5526099.6186 us - 57.97985793317826%3134.228800000001 us - 0.3649635223062446%
    [Avg&Per] (D_1C)decompressget_pageDatapageHeader3565878185.5454999999 805399999992 us - 330.482666568996265%07690455451459878%311587712.160799999983 402700000001 us - 30.911805656191469%09886107156476356%1156297813.7179 90999999998 us - 140.474658381255693%08578451820695045%296407725.983400000016 986500000003 us - 30.4515277590088287%08106107934716146%
    [Avg&Per] (D_21)decodedecompress_pageData457950548160.96670000016 3352000009 us - 435.000434862259155%149893543905802%486085525441.0215000002 0348000005 us - 616.02639204858503%7353412114264035%485623585036.25309999986 2351000007 us - 606.790866035802786%422783415941812%545723632154.8052999998 us - 63.54650374875476%6457000006 us - 6.632568914631739%
    [Avg&Per] (D-1_2)7_1_data_ByteBuffer_to_ByteArray(us)1269.7615999999996 us - 0.36377892330196115%1808.5776999999991 us - 7.207505165727991%1687.9916000000003 us - 1.412377327892232%decode_pageData4443785.968300003 us - 41.748778959531215%1996996.816400002 us - 25.598409841945312%3156902.088299994 us - 34.65785050907054%3276856.417400001 us - 34.38078982666053%
    [3] D_1 compare each step inside
    [Avg&Per] (D-1)7_21_data_decompressByteBuffer_to_PageDataByteArrayByteArray(us)345856110421.23130000004 72189999989 us - 9964.08569249502276%93156378407389%21247109687.39440000001 91739999971 us - 8463.67466169480036%92643734398307%116100113658.69619999993 35759999987 us - 9762.14384305311928%901942587101885%2432109187.4552000000012 47799999996 us - 3163.84820675254647%08420663361515%
    [Avg&Per] (D-1)7_32_data_ByteArraydecompress_to_ByteBufferPageDataByteArray(us)37454624.18470000000025 26410000002 us - 032.10720162531460037%12084386602246%44257095.10720000000003 555900000094 us - 133.761876157051776%27545607007074%42462164.26030000000026 880499999985 us - 034.35498732863644405%403908579311135%42159072.1930000000002 25890000005 us - 534.51469221169019%12961499817788%
    [Avg&Per] (D-1)7_43_data_splitByteArray_timeto_value_BufferByteBuffer(us)15471179.4220999999993 3427000000347 us - 0.443326956360674%6934918640164246%15941234.8989000000004 10400000004 us - 60.355956982419887%7192394012210651%13011032.2614999999998 115900000022 us - 10.0887922903520593%5712038820191121%15861193.372500000001 2719000000652 us - 200.770421328439692%6894253122110825%
    [[Avg&Per] (D-2)8_1_createBatchData1)7_4_data_split_time_value_Buffer(us)34303833.8672 2921999999994 us - 02.22855030377902266%254100485887217%41743567.8958 0158000000133 us - 02.275155541260775%0788671847251163%34383835.1465 9775000000177 us - 02.22874736939899784%122944951567873%37303629.6205 1045000000054 us - 0.24721094086707432%2.0967530559958814%
    [3] D_2 compare each step inside
    [Avg&Per] (D-2)8_21_timeDecoder_hasNextcreateBatchData(us)2340166008.77980000002 9294 us - 150.589238228946503%04720551599260821%2361356005.9462 094 us - 150.56305048087338%058953749593284185%2352789136.4135 1988 us - 150.65358490817499%07959246314858166%2344696219.6457 250599999999 us - 150.537217392727715%053213386787106826%
    [Avg&Per] (D-2)8_32_timeDecoder_readLonghasNext(us)3578931795067.9434 4479 us - 2314.841426880277478%10186066084482%3602531862631.1425 8749 us - 2318.743262865502565%285997377780266%3583411805100.1524 2815999999 us - 2315.84121675993312%725618584694368%3630631838661.0738 8702 us - 2415.058508247673558%732028111177547%
    [Avg&Per] (D-2)8_43_valueDecodertimeDecoder_readreadLong(us)3538212073615.6809 2138 us - 2316.570149451806063%290102549307967%3594772089493.8899 8765 us - 2320.692168165422427%513167449482335%3564402063469.4841 3846 us - 2317.714761161335133%976470800088325%3557732172029.1939 5514 us - 2318.5754416723178%58440125112089%
    [Avg&Per] (D-2)8_4_5valueDecoder_checkValueSatisfyOrNotread(us)2237584636195.3096 4124 us - 1436.905861011513531%42146247963989%2249381880348.2721 902 us - 1418.825043539994219%459930571697104%2242823352494.1638 9242 us - 1429.921980483485841%206164899804453%2260533364558.6587 2458 us - 1428.979528915812129%787960289219587%
    [Avg&Per] (D-2)8_65_putIntoBatchDatacheckValueSatisfyOrNot(us)3282211724239.5562 435 us - 2113.864774123677403%545443257159627%3323051784205.597 2864 us - 2117.90131940694663%516060810611705%3252511723128.7878 802 us - 2115.639709317671908%01150190311584%3259931780807.7043 3196 us - 21.602092830601723%
    • 相对其它压缩方法,GZIP的压缩率最高,同时它的D-1解压缩步骤耗时占比也更高。
    • 真实数据集的压缩率高,磁盘数据量少,D类操作耗时超过B类操作耗时,即整体耗时瓶颈在D类操作。
    • 真实数据集的压缩率高,D-1步骤内部的主要耗时瓶颈是子步骤7_2_data_decompress_PageDataByteArray(us)。
      • 人工数据实验里发现另一个子步骤7_1_data_ByteBuffer_to_ByteArray(us)的占比也高,其主要因为人工数据实验里数据压缩率很低,从而子步骤7_2_data_decompress_PageDataByteArray(us)耗时少,从而相对来说7_1_data_ByteBuffer_to_ByteArray(us)占比高了。
    • 本实验里,D-2类操作内部没有突出的耗时瓶颈子步骤。

    后续:可以增大真实数据集的数据量之后再实验看看。

    改变编码方式

    人工数据实验结果

    RLValueEncodingSynExpScripts.sh

    中车数据实验结果

    ...

    15.237010821076382%
    [Avg&Per] (D-2)8_6_putIntoBatchData(us)2494168.5891 us - 19.59392553705509%2563425.3348 us - 25.165890040835308%2525393.9444 us - 22.000651349148434%2525103.5280999998 us - 21.605386140618513%