Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

一、实验设置

实验目的

  1. 读一个TsFile的磁盘IO代价和CPU代价的大小比较
  2. 了解CPU代价里有没有突出的耗时瓶颈操作

IoTDB版本

  • v0.13.1

实验环境

  • FIT楼166.111.130.101 / 192.168.130.31
  • CPU:Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz,(6核12线程)
  • L1 cache 284KB, L2 cache 1536KB, L3 cache 12MB
  • 内存:16G
  • 硬盘:1.8T HDD /dev/sdb1 mounted on /disk
  • 操作系统:Ubuntu 16.04.7 LTS
  • 工作文件夹:/disk/rl/tsfileReadExp/

数据集

  • 中车数据:/disk/rl/zc_data
  • 人工数据:时间戳从1开始以步长1递增,值在[0,100)随机取整数

实验工具

...

实验设计

分析TsFile文件结构,然后把读取TsFile的耗时步骤进行分解和分类,最后进行实验测量统计。

TsFile结构

Image Added





















TsFile读耗时步骤分解与分类

Image Added















耗时分类耗时分解
(A)get_chunkMetadatas

(A)1_index_read_deserialize_MagicString_FileMetadataSize

(A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter

(A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet

(B)load_on_disk_chunk

(B)4_data_read_deserialize_ChunkHeader

(B)5_data_read_ChunkData

(C)get_pageHeader

(C)6_data_deserialize_PageHeader

(D-1)decompress_pageData(D-1)7_1_data_ByteBuffer_to_ByteArray
(D-1)7_2_data_decompress_PageDataByteArray
(D-1)7_3_data_ByteArray_to_ByteBuffer
(D-1)7_4_data_split_time_value_Buffer
(D-2)decode_pageData

(D-2)8_1_createBatchData

(D-2)8_2_timeDecoder_hasNext
(D-2)8_3_timeDecoder_readLong
(D-2)8_4_valueDecoder_read
(D-2)8_5_checkValueSatisfyOrNot

(D-2)8_6_putIntoBatchData


回答实验目的1:分析A、B、C、D-1、D-2类操作耗时占比,对比B类操作(磁盘IO代价所在)和D类操作(CPU代价主要所在)耗时

回答实验目的2:分析D-1操作内部各子步骤耗时占比、D-2操作内部各子步骤耗时占比

实验结论

  1. 本次实验使用的人工数据集不具有代表性:其时间戳从1开始以步长1递增,值在[0,100)随机取整数,导致时间戳的编码压缩效率过高、值列的编码压缩效率过低。甚至对于值列来说,和其它编码方式相比,采用PLAIN编码的空间大小是最小的。
  2. 在中车ZT11529数据集上的实验结果来看:
    • 真实数据集的压缩率高、磁盘数据量相对小,此时【从磁盘加载Chunk数据的耗时】小于【解压缩和解码Page数据的耗时】,即整体耗时瓶颈不是磁盘IO。
    • D-1步骤内部的耗时瓶颈就是子步骤7_2_data_decompress_PageDataByteArray。注意:人工数据实验里发现另一个子步骤7_1_data_ByteBuffer_to_ByteArray(us)的占比也高,分析是因为人工数据压缩率很低,子步骤7_2_data_decompress_PageDataByteArray(us)耗时相对少,从而7_1_data_ByteBuffer_to_ByteArray(us)耗时占比相对偏高。
    • D-2类操作内部没有某一个子步骤是突出的耗时瓶颈。
    • 相对其它压缩方法,GZIP的压缩率最高,但磁盘加载IO代价和解压缩代价之间有tradeoff,GZIP压缩下的整体读耗时并不是最小的。
  3. 后续
    1. 可以增大真实数据集的数据量之后再实验看看,目前使用的中车数据量级是一千万点。
    2. D-1解压缩和D-2解码的空间压缩关系和耗时关系还有待探索
    3. 写数据的耗时可以也测量一下
    4. 注意RLE编码对于浮点数是有损的

实验设置

IoTDB版本

  • v0.13.1

实验环境

  • FIT楼166.111.130.101 / 192.168.130.31
  • CPU:Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz,(6核12线程)
  • L1 cache 284KB, L2 cache 1536KB, L3 cache 12MB
  • 内存:16G
  • 硬盘:1.8T HDD /dev/sdb1 mounted on /disk
  • 操作系统:Ubuntu 16.04.7 LTS
  • 工作文件夹:/disk/rl/tsfileReadExp/

数据集

(1)人工数据:时间戳从1开始以步长1递增,值在[0,100)随机取整数。

(2)中车数据:/disk/rl/zc_data/ZT11529.csv,这条传感器时间序列数据共12,780,287个点,其部分数据作图和csv部分截图如下所示。

Image Added


Image Added

实验工具

RLTsFileReadCostBench用法

(1) 用人工数据写TsFile

Code Block
languagebash
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar WRITE_SYN [pagePointNum] [numOfPagesInChunk] [chunksWritten] [timeEncoding] [valueDataType] [valueEncoding] [compressionType]
  • WRITE_SYN:用来标识是“写人工数据/写真实数据/读数据”中的“写人工数据”

  • pagePointNum(ppn):一个page内的点数

  • numOfPagesInChunk(pic):一个chunk内的pages数

  • chunksWritten(cw):写的chunks总数

  • timeEncoding(te):时间戳列编码方式

  • valueDataType(vt):值列数据类型

  • valueEncoding(ve):值列编码方式

  • compressionType(co):压缩方式

(2) 用真实数据集写TsFile

Code Block
languagebash
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar WRITE_REAL [path_of_real_data_csv_to_write] [pagePointNum] [numOfPagesInChunk] [timeEncoding] [valueDataType] [valueEncoding] [compressionType]
  • WRITE_REAL:用来标识是“写人工数据/写真实数据/读数据”中的“写真实数据”

  • path_of_real_data_csv_to_write:用来写TsFile的真实数据集csv地址

  • pagePointNum(ppn):一个page内的点数

  • numOfPagesInChunk(pic):一个chunk内的pages数

  • timeEncoding(te):时间戳列编码方式

  • valueDataType(vt):值列数据类型

  • valueEncoding(ve):值列编码方式

  • compressionType(co):压缩方式

(3) 读实验

Code Block
languagebash
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar READ [path_of_tsfile_to_read] [decomposeMeasureTime] [D_decompose_each_step] (timeEncoding)
  • WRITE_REAL:用来标识是“写人工数据/写真实数据/读数据”中的“读数据”

  • path_of_tsfile_to_read:要读取的TsFile地址

  • decomposeMeasureTimeFALSE to measure the read process as a whole, in which case D_decompose_each_step is useless. TRUE to measure the decomposed read process, and the decomposition granularity is controlled by D_decompose_each_step.

  • D_decompose_each_step:When decomposeMeasureTime is TRUE, D_decompose_each_step=FALSE to measure the "(D_1)decompress_pageData" and "(D_2)decode_pageData" steps without further deomposition, D_decompose_each_step=TRUE to break down these two steps further and measure substeps inside.

  • timeEncoding(te):If timeEncoding is not specified, TS_2DIFF will be used by default. timeEncoding should be the same with that used to write the TsFile.


控制参数decomposeMeasureTime=FALSE

decomposeMeasureTime=TRUE & D_decompose_each_step=FALSE

用于分析A/B/C/D-1/D-2类操作耗时占比(&实验目标1)

decomposeMeasureTime=TRUE & D_decompose_each_step=TRUE

用于分析D-1操作内部各子步骤耗时占比、D-2操作内部各子步骤耗时占比(&实验目标2)

测量的最小单元步骤total_time(us)
  • (A)1_index_read_deserialize_MagicString_FileMetadataSize(us)
  • (A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us)
  • (A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us)
  • (B)4_data_read_deserialize_ChunkHeader(us)
  • (B)5_data_read_ChunkData(us)
  • (C)6_data_deserialize_PageHeader(us)
  • (D-1)7_data_decompress_PageData(us)
  • (D-2)8_data_decode_PageData(us)
  • (A)1_index_read_deserialize_MagicString_FileMetadataSize(us)
  • (A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us)
  • (A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us)
  • (B)4_data_read_deserialize_ChunkHeader(us)
  • (B)5_data_read_ChunkData(us)
  • (C)6_data_deserialize_PageHeader(us)
  • (D-1)7_1_data_ByteBuffer_to_ByteArray(us)
  • (D-1)7_2_data_decompress_PageDataByteArray(us)
  • (D-1)7_3_data_ByteArray_to_ByteBuffer(us)
  • (D-1)7_4_data_split_time_value_Buffer(us)
  • (D-2)8_1_createBatchData(us)
  • (D-2)8_2_timeDecoder_hasNext(us)
  • (D-2)8_3_timeDecoder_readLong(us)
  • (D-2)8_4_valueDecoder_read(us)
  • (D-2)8_5_checkValueSatisfyOrNot(us)
  • (D-2)8_6_putIntoBatchData(us)


自动化脚本

(1) RLUnitSynExp.sh:用人工数据写TsFile,清空系统缓存,然后进行一次读TsFile实验。

  • 输入:见RLTsFileReadCostBench的写数据参数和读数据参数
  • 输出:一个TsFile文件、一个TsFile空间统计结果文件( *writeResult.csv)、一个读TsFile耗时结果csv文件( *readResult-T*csv

(2) RLUnitRealExp.sh:用真实数据写TsFile,清空系统缓存,然后进行一次读TsFile实验。

  • 输入:见RLTsFileReadCostBench的写数据参数和读数据参数

  • 输出:一个TsFile文件、一个TsFile空间统计结果文件( *writeResult.csv)、一个读TsFile耗时结果csv文件( *readResult-T*csv

(3) RLReadExpScripts.sh:重复若干次读实验,把读实验结果进行汇总,把写文件的空间结果和读文件的耗时结果汇总到一起,最后对读实验结果进行平均值和百分比统计计算。

  • 输入:

    • WRITE_READ_JAR_PATH:RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar的地址

    • Calculator_JAR_PATH:把若干次重复读实验结果进行平均值和百分比计算的RLRepeatReadResultAvgPercCalculator-0.13.1-jar-with-dependencies.jar的地址

    • FILE_NAME:要读取的TsFile的地址

    • decomposeMeasureTime:见RLTsFileReadCostBench读数据参数

    • D_decompose_each_step:见RLTsFileReadCostBench读数据参数

    • te::见RLTsFileReadCostBench读数据参数

    • REPEAT:读实验重复次数

  • 输出:

    • REPEAT个读TsFile耗时结果csv文件 *readResult-T*csv

    • 一个把重复读实验结果横向拼接起来的csv文件 *readResult-combined.csv

    • 一个把写结果和读结果拼接起来的csv文件 *allResult-combined.csv

    • 一个把读结果取平均值并且按照不同粒度统计百分比的csv文件 *allResult-combined-processed.csv

(4) RLCompressionExpScripts.sh:在不同的压缩方式参数下(UNCOMPRESSED, SNAPPY, GZIP, LZ4),写TsFile,清空系统缓存,然后进行若干次重复读TsFile实验,把读实验结果进行汇总,把写文件的空间结果和读文件的耗时结果汇总到一起,最后对读实验结果进行平均值和百分比统计计算。

  • 输入:

    • 工具地址:

      • WRITE_READ_JAR_PATH:RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar的地址

      • Calculator_JAR_PATH:把若干次重复读实验结果进行平均值和百分比统计计算的RLRepeatReadResultAvgPercCalculator-0.13.1-jar-with-dependencies.jar的地址

      • TOOL_PATH:用于替换脚本中变量值的自动脚本工具RLtool.sh的地址

      • READ_SCRIPT_PATH:RLReadExpScripts.sh的地址

    • 写数据参数:见RLTsFileReadCostBench写数据参数

    • 读数据参数:见RLTsFileReadCostBench读数据参数

    • REPEAT:读实验重复次数
  • 输出:不同压缩方式下的一个TsFile文件、一个TsFile空间统计结果文件( *writeResult.csv)、REPEAT个读TsFile耗时结果csv文件( *readResult-T*csv)、一个把重复读实验结果横向拼接起来的csv文件(*readResult-combined.csv)、一个把写结果和读结果拼接起来的csv文件(*allResult-combined.csv)、一个把读结果取平均值并且按照不同粒度统计百分比的csv文件( *allResult-combined-processed.csv

其它类似,不再赘述。

实验具体结果

中车数据实验结果

改变压缩方式

RLCompressionRealExpScripts.sh
压缩方式GZIPLZ4SNAPPYUNCOMPRESSED
dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv
pagePointNum(ppn)10000100001000010000
numOfPagesInChunk(pic)100100100100
chunksWritten(cw)13131313
timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
valueDataType(vt)DOUBLEDOUBLEDOUBLEDOUBLE
valueEncoding(ve)GORILLAGORILLAGORILLAGORILLA
compression(co)GZIPLZ4SNAPPYUNCOMPRESSED
totalPointNum12780287127802871278028712780287
tsfileSize(MB)19.3486261423.7774105123.1564121236.30773735
chunkDataSize_stats_mean(MB)1.5151396591.860070071.8131843412.837062438
compressedPageSize_stats_mean(B)15824.1666719440.27518948.6929684.7575
uncompressedPageSize_stats_mean(B)29684.757529684.757529684.757529684.7575
timeBufferSize_stats_mean(B)11461.462511461.462511461.462511461.4625
valueBufferSize_stats_mean(B)18221.2683318221.2683318221.2683318221.26833
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
[Avg&Per] (A)get_chunkMetadatas87316.85010000001 us - 8.19883087518919%100289.6416 us - 12.59103802007131%89466.6045 us - 11.199530364576052%88760.777 us - 10.335699109318087%
[Avg&Per] (B)load_on_disk_chunk160699.6025 us - 15.089285299443361%176784.88239999997 us - 22.19476647997349%105802.9384 us - 13.244531050378352%191518.85280000002 us - 22.301305860612082%
[Avg&Per] (C)get_pageHeader2436.5129999999995 us - 0.22878239411203669%2198.3668000000007 us - 0.27599779517870476%2319.9517000000005 us - 0.29041416798711567%3134.228800000001 us - 0.3649635223062446%
[Avg&Per] (D_1)decompress_pageData356587.5454999999 us - 33.482666568996265%31158.160799999983 us - 3.911805656191469%115629.7179 us - 14.474658381255693%29640.983400000016 us - 3.4515277590088287%
[Avg&Per] (D_2)decode_pageData457950.96670000016 us - 43.000434862259155%486085.0215000002 us - 61.02639204858503%485623.25309999986 us - 60.790866035802786%545723.8052999998 us - 63.54650374875476%
SUM1064991.4778 us796516.0731000002 us798842.4655999999 us858778.6472999998 us
[3] D_1 compare each step inside
[Avg&Per]

RLTsFileReadCostBench用法

(1) 用人工数据写TsFile

Code Block
languagebash
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar WRITE_SYN [pagePointNum] [numOfPagesInChunk] [chunksWritten] [timeEncoding] [valueDataType] [valueEncoding] [compressionType]
  • WRITE_SYN:用来标识是“写人工数据/写真实数据/读数据”中的“写人工数据”

  • pagePointNum(ppn):一个page内的点数

  • numOfPagesInChunk(pic):一个chunk内的pages数

  • chunksWritten(cw):写的chunks总数

  • timeEncoding(te):时间戳列编码方式

  • valueDataType(vt):值列数据类型

  • valueEncoding(ve):值列编码方式

  • compressionType(co):压缩方式

(2) 用真实数据集写TsFile

Code Block
languagebash
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar WRITE_REAL [path_of_real_data_csv_to_write] [pagePointNum] [numOfPagesInChunk] [timeEncoding] [valueDataType] [valueEncoding] [compressionType]
  • WRITE_REAL:用来标识是“写人工数据/写真实数据/读数据”中的“写真实数据”

  • path_of_real_data_csv_to_write:用来写TsFile的真实数据集csv地址

  • pagePointNum(ppn):一个page内的点数

  • numOfPagesInChunk(pic):一个chunk内的pages数

  • timeEncoding(te):时间戳列编码方式

  • valueDataType(vt):值列数据类型

  • valueEncoding(ve):值列编码方式

  • compressionType(co):压缩方式

(3) 读实验

Code Block
languagebash
java -jar RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar READ [path_of_tsfile_to_read] [decomposeMeasureTime] [D_decompose_each_step] (timeEncoding)
  • WRITE_REAL:用来标识是“写人工数据/写真实数据/读数据”中的“读数据”

  • path_of_tsfile_to_read:要读取的TsFile地址

  • decomposeMeasureTimeFALSE to measure the read process as a whole, in which case D_decompose_each_step is useless. TRUE to measure the decomposed read process, and the decomposition granularity is controlled by D_decompose_each_step.

  • D_decompose_each_step:When decomposeMeasureTime is TRUE, D_decompose_each_step=FALSE to measure the "(D_1)decompress_pageData" and "(D_2)decode_pageData" steps without further deomposition, D_decompose_each_step=TRUE to break down these two steps further and measure substeps inside.

  • timeEncoding(te):If timeEncoding is not specified, TS_2DIFF will be used by default. timeEncoding should be the same with that used to write the TsFile.

控制参数decomposeMeasureTime=FALSEdecomposeMeasureTime=TRUE & D_decompose_each_step=FALSEdecomposeMeasureTime=TRUE & D_decompose_each_step=TRUE
测量最小单元步骤total_time(us)
  • (A)1_index_read_deserialize_MagicString_FileMetadataSize(us)
  • (A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us)
  • (A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us)
  • (B)4_data_read_deserialize_ChunkHeader(us)
  • (B)5_data_read_ChunkData(us)
  • (C)6_data_deserialize_PageHeader(us)
  • (D-1)7_data_decompress_PageData(us)
  • (D-2)8_data_decode_PageData(us)
  • (A)1_index_read_deserialize_MagicString_FileMetadataSize(us)(A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us)(A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us)
  • (B)4_data_read_deserialize_ChunkHeader(us)
  • (B)5_data_read_ChunkData(us)(C)6_data_deserialize_PageHeader(us) (D-1)7_1_data_ByteBuffer_to_ByteArray(us)
    1269.7615999999996 us - 0.36377892330196115%1808.5776999999991 us - 7.207505165727991%1687.9916000000003 us - 1.412377327892232%3197.6313 us - 41.86667970732365%
    [Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)(D-1)7_3_data_ByteArray_to_ByteBuffer(us) (D-1)7_42_data_split_time_value_Bufferdecompress_PageDataByteArray(us)(D-2)8_1_createBatchData(us)(D-2)8_2_timeDecoder_hasNext(us)(D-2)8_3_timeDecoder_readLong(us)(D-2)8_4_valueDecoder_read(us)(D-2)8_5_checkValueSatisfyOrNot(us)
  • (D-2)8_6_putIntoBatchData(us)
  • Image Removed

    Image Removed

    结合实验目的,

    • 目的1:decomposeMeasureTime=TRUE & D_decompose_each_step=FALSE:对比B类操作(磁盘IO代价所在)和D类操作(CPU代价主要所在)耗时
    • 目的2:decomposeMeasureTime=TRUE & D_decompose_each_step=TRUE:分析D-1操作内部各子步骤耗时占比,分析D-2操作内部各子步骤耗时占比

    自动化脚本

    (1) RLUnitSynExp.sh:用人工数据写TsFile,清空系统缓存,然后进行一次读TsFile实验。

    • 输入:见RLTsFileReadCostBench的写数据参数和读数据参数
    • 输出:一个TsFile文件、一个TsFile空间统计结果文件( *writeResult.csv)、一个读TsFile耗时结果csv文件( *readResult-T*csv

    (2) RLUnitRealExp.sh:用真实数据写TsFile,清空系统缓存,然后进行一次读TsFile实验。

    • 输入:见RLTsFileReadCostBench的写数据参数和读数据参数

    • 输出:一个TsFile文件、一个TsFile空间统计结果文件( *writeResult.csv)、一个读TsFile耗时结果csv文件( *readResult-T*csv

    (3) RLReadExpScripts.sh:重复若干次读实验,把读实验结果进行汇总,把写文件的空间结果和读文件的耗时结果汇总到一起,最后对读实验结果进行平均值和百分比统计计算。

    • 输入:

      • WRITE_READ_JAR_PATH:RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar的地址

      • Calculator_JAR_PATH:把若干次重复读实验结果进行平均值和百分比计算的RLRepeatReadResultAvgPercCalculator-0.13.1-jar-with-dependencies.jar的地址

      • FILE_NAME:要读取的TsFile的地址

      • decomposeMeasureTime:见RLTsFileReadCostBench读数据参数

      • D_decompose_each_step:见RLTsFileReadCostBench读数据参数

      • te::见RLTsFileReadCostBench读数据参数

      • REPEAT:读实验重复次数

    • 输出:

      • REPEAT个读TsFile耗时结果csv文件 *readResult-T*csv

      • 一个把重复读实验结果横向拼接起来的csv文件 *readResult-combined.csv

      • 一个把写结果和读结果拼接起来的csv文件 *allResult-combined.csv

      • 一个把读结果取平均值并且按照不同粒度统计百分比的csv文件 *allResult-combined-processed.csv

    (4) RLCompressionExpScripts.sh:在不同的压缩方式参数下(UNCOMPRESSED, SNAPPY, GZIP, LZ4),写TsFile,清空系统缓存,然后进行若干次重复读TsFile实验,把读实验结果进行汇总,把写文件的空间结果和读文件的耗时结果汇总到一起,最后对读实验结果进行平均值和百分比统计计算。

    • 输入:

      • 工具地址:

        • WRITE_READ_JAR_PATH:RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar的地址

        • Calculator_JAR_PATH:把若干次重复读实验结果进行平均值和百分比统计计算的RLRepeatReadResultAvgPercCalculator-0.13.1-jar-with-dependencies.jar的地址

        • TOOL_PATH:用于替换脚本中变量值的自动脚本工具RLtool.sh的地址

        • READ_SCRIPT_PATH:RLReadExpScripts.sh的地址

      • 写数据参数:见RLTsFileReadCostBench写数据参数

      • 读数据参数:见RLTsFileReadCostBench读数据参数

      • REPEAT:读实验重复次数
    • 输出:不同压缩方式下的一个TsFile文件、一个TsFile空间统计结果文件( *writeResult.csv)、REPEAT个读TsFile耗时结果csv文件( *readResult-T*csv)、一个把重复读实验结果横向拼接起来的csv文件(*readResult-combined.csv)、一个把写结果和读结果拼接起来的csv文件(*allResult-combined.csv)、一个把读结果取平均值并且按照不同粒度统计百分比的csv文件( *allResult-combined-processed.csv

    其它类似,不再赘述。

    二、实验结果

    改变压缩方式

    人工数据实验结果

    345856.23130000004 us - 99.08569249502276%21247.39440000001 us - 84.67466169480036%116100.69619999993 us - 97.14384305311928%2432.4552000000012 us - 31.84820675254647%
    [Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)374.18470000000025 us - 0.10720162531460037%442.10720000000003 us - 1.761876157051776%424.26030000000026 us - 0.35498732863644405%421.1930000000002 us - 5.51469221169019%
    [Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)1547.4220999999993 us - 0.443326956360674%1594.8989000000004 us - 6.355956982419887%1301.2614999999998 us - 1.0887922903520593%1586.372500000001 us - 20.770421328439692%
    [3] D_2 compare each step inside
    [Avg&Per] (D-2)8_1_createBatchData(us)3430.8672 us - 0.22855030377902266%4174.8958 us - 0.275155541260775%3438.1465 us - 0.22874736939899784%3730.6205 us - 0.24721094086707432%
    [Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)234016.77980000002 us - 15.589238228946503%236135.9462 us - 15.56305048087338%235278.4135 us - 15.65358490817499%234469.6457 us - 15.537217392727715%
    [Avg&Per] (D-2)8_3_timeDecoder_readLong(us)357893.9434 us - 23.841426880277478%360253.1425 us - 23.743262865502565%358341.1524 us - 23.84121675993312%363063.0738 us - 24.058508247673558%
    [Avg&Per] (D-2)8_4_valueDecoder_read(us)353821.6809 us - 23.570149451806063%359477.8899 us - 23.692168165422427%356440.4841 us - 23.714761161335133%355773.1939 us - 23.5754416723178%
    [Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)223758.3096 us - 14.905861011513531%224938.2721 us - 14.825043539994219%224282.1638 us - 14.921980483485841%226053.6587 us - 14.979528915812129%
    [Avg&Per] (D-2)8_6_putIntoBatchData(us)328221.5562 us - 21.864774123677403%332305.597 us - 21.90131940694663%325251.7878 us - 21.639709317671908%325993.7043 us - 21.602092830601723%


    补充实验

    时间戳和值都使用PLAIN编码,然后改变压缩。

    RLCompressionRealExpScriptsRLCompressionSynExpScripts.sh
    压缩方式GZIPLZ4SNAPPYUNCOMPRESSEDdatasetsyntheticsyntheticsynthetic
    dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csvsynthetic
    pagePointNum(ppn)10000100001000010000
    numOfPagesInChunk(pic)1000100100010010001001000100
    chunksWritten(cw)1013101310131013
    timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFPLAINPLAINPLAINPLAINTS_2DIFF
    valueDataType(vt)INT64DOUBLEINT64DOUBLEINT64DOUBLEINT64DOUBLE
    valueEncoding(ve)PLAINPLAINPLAINPLAIN
    compression(co)GZIPLZ4SNAPPYUNCOMPRESSED
    totalPointNum10000000012780287100000000127802871000000001278028710000000012780287
    tsfileSize(MB)76750.1312866757878377074.84443197114610776775.942390481546879781195.42261510946026
    chunkDataSize_stats_mean(MB)763.71300761977135579775.08436436846096277765.79412649327300397815.1421661426517868
    compressedPageSize_stats_mean(B)8037541639.41867289178076461236.8144476258046062145.477891833381874160003
    uncompressedPageSize_stats_mean(B)81874160003818741600038187416000381874160003
    timeBufferSize_stats_mean(B)187280000187280000187280000187280000
    valueBufferSize_stats_mean(B)80000800008000080000
    total_time(us)2096685.59811244193.04041195582.11731895095.384
    [2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
    [Avg&Per] (A)get_chunkMetadatas10165486791.7566 3859 us - 04.9318259204986696%1844839300349514%86735100869.1136 9527 us - 18.1689451651044902%731470875680406%8291899547.49919999999 05960000001 us - 17.0656107165107132%911992556094877%9389688166.1409 6213 us - 14.2531889263513831%659966496653105%
    [Avg&Per] (B)load_on_disk_chunk5949593349328.8917000005 55549999996 us - 5416.53739687304131%84222128306944%4849630452828.197000001 32 us - 6539.35936296194448%197572537012675%5086618450015.179 7795 us - 6535.36966894765757%76721916082827%48513861155773.484999999 8993 us - 6461.74924062031137%08737716190697%
    [Avg&Per] (C)get_pageHeader66132818.381399999991 293900000001 us - 0.060622054656159316%13587875585087916%71203913.158900000014 898699999998 us - 0.09595969816001626%3387935811871694%76923502.346500000014 114700000002 us - 0.09885667184764595%27834780402685505%76054450.696300000007 306 us - 0.10150975630087579%23521687180559223%
    [Avg&Per] (D_1)decompress_pageData28594281350175.0031000106 8349000001 us - 2665.211160404158996%09619618668371%521804261417.2723000004 90119999985 us - 722.032452670194604%628768326063632%605202395144.8143000009 6617 us - 731.777644443672276%40606698493612%498170173785.42259999976 97769999996 us - 69.648853201570805%185299626198828%
    [Avg&Per] (D_2)decode_pageData1991910285009.319299996 94010000007 us - 1813.25899474764487%74121984436101%1954657336215.4198999994 7518000001 us - 2629.343279504596413%103394680056127%1998880309969.5966999987 7735 us - 2524.68821922031179%636373494113883%2041517469824.9070999995 3798999997 us - 27.247207495465567%24.832139843435506%
    SUM2074124.01031155245.82439999981258179.3891892001.1841999996
    [3] D_1 compare each step inside
    [Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)659524312.37819999998 080999999998 us - 20.5136549346918415%3365746911741179%10880910247.34350000018 3701 us - 595.6896105506002%205625954425964%1081329355.35939999981 254500000003 us - 432.622622294156905%648386471251286%11076533088.11740000003 566800000015 us - 6356.813731447511664%71920599957147%
    [Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)25546871274619.926599999 9271000002 us - 9799.36728361519128%48904214184739%68904183796.38600000006 93149999995 us - 3793.79892271446546%36815862249874%135345341381.91170000008 81159999996 us - 5496.601079805416944%64204980983628%5754721318.215800000035 842899999996 us - 3336.15396273496119%54397754446109%
    [Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)811454.8335000000624 68920000000026 us - 0.03094155721322126%03549026028736633%1239659.949800000022 1047999999998 us - 0.680200047933345%334822790636471%1184604.7460000000272 4583000000001 us - 0.47794876167766753%17111658310904862%1229973.1439000000355 5250000000001 us - 01.7081314098345313%6687808013713306%
    [Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)23121779.0582000000422 4489000000008 us - 0.08811989290364733%1388929066911369%33382148.2513999999974 4263999999994 us - 1.8312666870009688%091392632438828%32181902.3657999999955 0298000000007 us - 10.29834913874849%5384471358033915%40342956.201500000018 5653000000016 us - 25.3241744076926314%068035654596102%
    [3] D_2 compare each step inside
    [Avg&Per] (D-2)8_1_createBatchData(us)53843343.7852 3923 us - 0.053292019060348375%259976288059772%58483522.7599 578 us - 0.05759123169122766%27212097540020236%59133458.4963 9225 us - 0.058362326692940975%2672966393395521%60193730.3023 5984 us - 0.05943520403215091%2816327187407449%
    [Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)1859842232202.2956 511 us - 18.406444711361424%05565768873081%1862234231677.7849 2947 us - 1817.336946086748988%897191037883086%1864092231911.3926 8133 us - 1817.397368271414525%921548782382853%1857778237390.6739 8697 us - 1817.343895858133802%921263258420133%
    [Avg&Per] (D-2)8_3_timeDecoder_readLong(us)2074757254086.7936 67 us - 2019.533415498567617%75733129255225%2084700255389.4377 3804 us - 2019.527508047369906%72896194244707%2063043255976.8916 5501 us - 2019.360888969091857%78120978179259%2069930261059.4964 1494 us - 2019.43870456313607%70804415658043%
    [Avg&Per] (D-2)8_4_valueDecoder_read(us)1876012241634.952 0221 us - 18.56648209392724%78903535624908%1881471242640.5433999998 7293 us - 18.526365490982297%744122040429612%1877809242328.2412 9765 us - 18.532744562964893%726560376256852%1876843246610.1276 5964 us - 18.53214021585961%6172847590372%
    [Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)1780379230100.6374 908 us - 17.620020492363725%892240746329144%1780782231053.3133 5736 us - 17.534904586680103%849008259784295%1781949231043.2049 8535 us - 17.586668929952697%85447508020484%1780599235200.5789 0462 us - 17.5818216126948%755872210542634%
    [Avg&Per] (D-2)8_6_putIntoBatchData(us)2507922324669.0072 8983 us - 2425.82034518471963%24575862807894%2540605330206.1784 1447 us - 25.016684556527476%508595744055732%2539577329318.912 78729999997 us - 25.063966939883077%448909340023306%2536332340641.2055 1962 us - 25.044002546143567%

    B类操作耗时超过D类操作耗时。分析原因:使用的人工数据数值是INT64类型的随机取整的数,且PLAIN编码,且四种压缩方式此时的压缩率都不高,所以磁盘数据量偏大。

    中车数据实验结果

    715902896678855%


    • 可以看到,当时间戳列和值列都使用PLAIN编码之后,压缩负责了全部的压缩率,此时D-1操作和耗时占比有了明显提高;但是也可以看到,即便如此,除了GZIP之外的压缩方式的D-1耗时占比也没有增大到60%以上,D-2解码操作仍然有不小的基础耗时

    改变值列编码方式

    RLValueEncodingRealExpScriptsRLCompressionRealExpScripts.sh

    ZT11529传感器数据如下图所示,共12,780,287个点。

    Image Removed

    编码方式GORILLAPLAINRLETS_2DIFF
    压缩方式GZIPLZ4SNAPPYUNCOMPRESSED
    dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv
    pagePointNum(ppn)10000100001000010000
    numOfPagesInChunk(pic)100100100100
    chunksWritten(cw)13131313
    timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
    valueDataType(vt)DOUBLEDOUBLEDOUBLEDOUBLE
    valueEncoding(ve)GORILLAGORILLAPLAINGORILLARLEGORILLATS_2DIFF
    compression(co)GZIPSNAPPYLZ4SNAPPYSNAPPYUNCOMPRESSEDSNAPPY
    totalPointNum12780287127802871278028712780287
    tsfileSize(MB)1923.34862614156412122326.77741051599601752322.15641212101478583620.3077373598550797
    chunkDataSize_stats_mean(MB)1.51513965981318434112.860070070811667441.81318434174007256821.837062438642661015
    compressedPageSize_stats_mean(B)1582418948.16667691944021758.275616671894818182.69299172968417160.75758925
    uncompressedPageSize_stats_mean(B)29684.75752968491463.7575489172968425245.7575934172968422210.757536417
    timeBufferSize_stats_mean(B)11461.462511461.462511461.462511461.4625
    valueBufferSize_stats_mean(B)18221.2683318221.26833800001822113782.268334451822110746.26833875
    [2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
    [Avg&Per] (A)get_chunkMetadatas8731688753.85010000001 03020000001 us - 811.19883087518919%21274000219672%10028997256.6416 88870000001 us - 12.59103802007131%181751942467722%8946684509.6045 5108 us - 119.199530364576052%498496964807579%8876086370.777 7961 us - 1011.335699109318087%6687579647861%
    [Avg&Per] (B)load_on_disk_chunk160699105215.6025 9241 us - 1513.089285299443361%29260306229143%176784138228.88239999997 6108 us - 2217.19476647997349%313597737139155%105802139816.9384 5536 us - 1315.244531050378352%714765088895252%19151896826.85280000002 5282 us - 2213.301305860612082%081335047881257%
    [Avg&Per] (C)get_pageHeader24362352.5129999999995 1583 us - 0.22878239411203669%2971632563133493%21982625.3668000000007 5312999999983 us - 0.27599779517870476%328856613051254%23193388.9517000000005 6254999999987 us - 0.29041416798711567%38086658793698225%31343159.228800000001 255499999999 us - 0.3649635223062446%4268177375109232%
    [Avg&Per] (D_1)decompress_pageData356587114457.5454999999 26139999992 us - 3314.482666568996265%460120522641777%31158182862.160799999983 35910000015 us - 322.911805656191469%904124612107367%115629155021.7179 76800000004 us - 1417.474658381255693%42376424722015%29640120684.983400000016 50350000002 us - 316.4515277590088287%304565026949902%
    [Avg&Per] (D_2)decode_pageData457950480759.96670000016 0216999998 us - 4360.000434862259155%73737315655672%486085377408.0215000002 39589999994 us - 6147.02639204858503%27166909523449%485623506978.25309999986 10549999995 us - 6056.790866035802786%982107111140046%545723433147.8052999998 us - 63.54650374875476%3443000001 us - 58.51852422287182%
    SUM791537.3956999998 us798381.7858000002 us889714.5634 us740188.4276000002
    [3] D_1 compare each step inside
    [Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)12691754.7615999999996 3673999999999 us - 01.36377892330196115%6860113426598728%18082320.5776999999991 3632000000002 us - 71.207505165727991%489795195953362%16871825.9916000000003 1960999999988 us - 1.412377327892232%548613686329812%31971607.6313 3271999999997 us - 411.86667970732365%758246074195737%
    [Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)345856100527.23130000004 56879999992 us - 9996.08569249502276%61067644486589%21247151532.39440000001 73299999998 us - 8497.67466169480036%29198327791244%116100113686.69619999993 73430000001 us - 9796.14384305311928%4591326329927%243287212.4552000000012 82 us - 3195.84820675254647%4016073295714%
    [Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)374431.18470000000025 9958999999999 us - 0.10720162531460037%41516388607230165%442468.10720000000003 2119000000003 us - 10.761876157051776%30061666178303303%424474.26030000000026 28859999999975 us - 0.35498732863644405%4024169332984032%421663.1930000000002 3730000000002 us - 50.51469221169019%7256599483773118%
    [Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)15471340.4220999999993 373799999999 us - 01.443326956360674%288148326401935%15941429.8989000000004 1740000000002 us - 60.355956982419887%9176048643511714%13011873.2614999999998 7816000000007 us - 1.0887922903520593%5898367473790769%15861932.372500000001 989900000001 us - 202.770421328439692%1144866478555286%
    [3] D_2 compare each step inside
    [Avg&Per] (D-2)8_1_createBatchData(us)34303348.8672 9259 us - 0.22855030377902266%2218271110091239%41743442.8958 3069 us - 0.275155541260775%24318767981262365%34383466.1465 6887 us - 0.22874736939899784%22277034162258696%37303338.6205 6453 us - 0.24721094086707432%22194982158403212%
    [Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)234016235317.77980000002 741 us - 15.589238228946503%587043790733997%236135244839.9462 8323 us - 1517.56305048087338%29713023052909%235278239308.4135 8755 us - 15.65358490817499%37805224577913%234469240223.6457 005 us - 15.537217392727715%969786637750962%
    [Avg&Per] (D-2)8_3_timeDecoder_readLong(us)357893361562.9434 5392 us - 23.841426880277478%949282819264262%360253363364.1425 4154 us - 2325.743262865502565%67050285597614%358341362143.1524 5815 us - 23.84121675993312%271443255687238%363063357833.0738 1113 us - 2423.058508247673558%788389623148674%
    [Avg&Per] (D-2)8_4_valueDecoder_read(us)353821356325.6809 4946 us - 23.570149451806063%602389962111484%359477241625.8899 5526 us - 2317.692168165422427%07005192367858%356440396438.4841 5566 us - 2325.714761161335133%475247513902037%355773335421.1939 7093 us - 2322.5754416723178%298501890735768%
    [Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)223758225554.3096 2633 us - 14.905861011513531%940327764084076%224938232364.2721 86419999998 us - 1416.825043539994219%415814695306036%224282224510.1638 63450000001 us - 14.921980483485841%4271133273284%226053230332.6587 4743 us - 1415.979528915812129%312273986066641%
    [Avg&Per] (D-2)8_6_putIntoBatchData(us)328221327591.5562 9399 us - 21.864774123677403%332305.597 us - 21.90131940694663%325251.7878 us - 21.639709317671908%699128552797042%329856.9807 us - 23.30331261469754%330303.223 325993.7043 us - 21.602092830601723%
    • 相对其它压缩方法,GZIP的压缩率最高,同时它的D-1解压缩步骤耗时占比也更高。
    • 真实数据集的压缩率高,磁盘数据量少,D类操作耗时超过B类操作耗时,即整体耗时瓶颈在D类操作。
    • 真实数据集的压缩率高,D-1步骤内部的主要耗时瓶颈是子步骤7_2_data_decompress_PageDataByteArray(us)。
      • 人工数据实验里发现另一个子步骤7_1_data_ByteBuffer_to_ByteArray(us)的占比也高,其主要因为人工数据实验里数据压缩率很低,从而子步骤7_2_data_decompress_PageDataByteArray(us)耗时少,从而相对来说7_1_data_ByteBuffer_to_ByteArray(us)占比高了。
    • 本实验里,D-2类操作内部没有突出的耗时瓶颈子步骤。

    后续:可以增大真实数据集的数据量之后再实验看看。

    改变编码方式

    人工数据实验结果

    RLValueEncodingSynExpScripts.sh
    225373315680617%337085.3345 us - 22.409098040713936%


    人工数据实验结果

    改变压缩方式

    RLCompressionSynExpScripts.sh
    压缩方式GZIPLZ4SNAPPYUNCOMPRESSED编码方式GORILLAPLAINRLETS_2DIFF
    datasetsyntheticsyntheticsyntheticsynthetic
    pagePointNum(ppn)10000100001000010000
    numOfPagesInChunk(pic)1000100010001000
    chunksWritten(cw)10101010
    timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
    valueDataType(vt)INT64INT64INT64INT64
    valueEncoding(ve)GORILLAPLAINPLAINRLEPLAINTS_2DIFFPLAIN
    compression(co)UNCOMPRESSEDGZIPUNCOMPRESSEDLZ4UNCOMPRESSEDSNAPPYUNCOMPRESSED
    totalPointNum100000000100000000100000000100000000
    tsfileSize(MB)805767.38128951312866781770.42261518444319781767.84223189423904793781.32447054226151
    chunkDataSize_stats_mean(MB)8076.53803624713007617877.14216614084364367876.1841278179412647978.3323516814216614
    compressedPageSize_stats_mean(B)8438680375.2518941867818748191880764.8144480460.477898187483122
    uncompressedPageSize_stats_mean(B)84386.25189818748187481918818748312281874
    timeBufferSize_stats_mean(B)1872187218721872
    valueBufferSize_stats_mean(B)82512.25189800008000080044800008124880000
    [2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
    [Avg&Per] (A)get_chunkMetadatas91331101654.98490000001 7566 us - 0.8580518676474486%9318259204986696%10094486735.7581 1136 us - 1.2939556377951902%1689451651044902%8809882918.20449999999 49919999999 us - 01.9671805828234409%0656107165107132%8823193896.2157 1409 us - 01.9257222461823116%2531889263513831%
    [Avg&Per] (B)load_on_disk_chunk55526455949593.935400001 8917000005 us - 5254.16637107440095%53739687304131%51701584849630.3812 197000001 us - 6665.27343223726832%35936296194448%52709145086618.364100001 179 us - 5765.866400973957255%36966894765757%55260994851386.6186 484999999 us - 5764.97985793317826%74924062031137%
    [Avg&Per] (C)get_pageHeader81856613.805399999992 381399999991 us - 0.07690455451459878%060622054656159316%77127120.402700000001 158900000014 us - 0.09886107156476356%09595969816001626%78137692.90999999998 346500000014 us - 0.08578451820695045%09885667184764595%77257605.986500000003 696300000007 us - 0.08106107934716146%10150975630087579%
    [Avg&Per] (D_1)decompress_pageData5481602859428.3352000009 0031000106 us - 526.149893543905802%211160404158996%525441521804.0348000005 2723000004 us - 67.7353412114264035%032452670194604%585036605202.2351000007 8143000009 us - 67.422783415941812%777644443672276%632154498170.6457000006 42259999976 us - 6.632568914631739%648853201570805%
    [Avg&Per] (D_2)decode_pageData44437851991910.968300003 319299996 us - 4118.748778959531215%25899474764487%19969961954657.816400002 4198999994 us - 2526.598409841945312%343279504596413%31569021998880.088299994 5966999987 us - 3425.65785050907054%68821922031179%32768562041517.417400001 9070999995 us - 3427.38078982666053%247207495465567%
    [3] D_1 compare each step inside
    [Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)11042165952.72189999989 37819999998 us - 642.93156378407389%5136549346918415%109687108809.91739999971 34350000018 us - 6359.92643734398307%6896105506002%113658108132.35759999987 35939999981 us - 6243.901942587101885%622622294156905%109187110765.47799999996 11740000003 us - 63.08420663361515%813731447511664%
    [Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)546242554687.26410000002 926599999 us - 3297.12084386602246%36728361519128%5709568904.555900000094 38600000006 us - 3337.27545607007074%79892271446546%62164135345.880499999985 91170000008 us - 3454.403908579311135%601079805416944%5907257547.25890000005 215800000035 us - 3433.12961499817788%15396273496119%
    [Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)1179811.3427000000347 8335000000624 us - 0.6934918640164246%03094155721322126%12341239.10400000004 949800000022 us - 0.7192394012210651%680200047933345%10321184.115900000022 7460000000272 us - 0.5712038820191121%47794876167766753%11931229.2719000000652 1439000000355 us - 0.6894253122110825%7081314098345313%
    [Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)38332312.2921999999994 0582000000422 us - 20.254100485887217%08811989290364733%35673338.0158000000133 2513999999974 us - 21.0788671847251163%8312666870009688%38353218.9775000000177 3657999999955 us - 21.122944951567873%29834913874849%36294034.1045000000054 201500000018 us - 2.0967530559958814%3241744076926314%
    [3] D_2 compare each step inside
    [Avg&Per] (D-2)8_1_createBatchData(us)60085384.9294 7852 us - 0.04720551599260821%053292019060348375%60055848.094 7599 us - 0.058953749593284185%05759123169122766%91365913.1988 4963 us - 0.07959246314858166%058362326692940975%62196019.250599999999 3023 us - 0.053213386787106826%05943520403215091%
    [Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)17950671859842.4479 2956 us - 1418.10186066084482%406444711361424%18626311862234.8749 7849 us - 18.285997377780266%336946086748988%18051001864092.2815999999 3926 us - 1518.725618584694368%397368271414525%18386611857778.8702 6739 us - 1518.732028111177547%343895858133802%
    [Avg&Per] (D-2)8_3_timeDecoder_readLong(us)20736152074757.2138 7936 us - 1620.290102549307967%533415498567617%20894932084700.8765 4377 us - 20.513167449482335%527508047369906%20634692063043.3846 8916 us - 1720.976470800088325%360888969091857%21720292069930.5514 4964 us - 1820.58440125112089%43870456313607%
    [Avg&Per] (D-2)8_4_valueDecoder_read(us)46361951876012.4124 952 us - 3618.42146247963989%56648209392724%18803481881471.902 5433999998 us - 18.459930571697104%526365490982297%33524941877809.9242 2412 us - 2918.206164899804453%532744562964893%33645581876843.2458 1276 us - 2818.787960289219587%53214021585961%
    [Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)17242391780379.435 6374 us - 1317.545443257159627%620020492363725%17842051780782.2864 3133 us - 17.516060810611705%534904586680103%17231281781949.802 2049 us - 1517.01150190311584%586668929952697%17808071780599.3196 5789 us - 1517.237010821076382%5818216126948%
    [Avg&Per] (D-2)8_6_putIntoBatchData(us)24941682507922.5891 0072 us - 1924.59392553705509%82034518471963%25634252540605.3348 1784 us - 25.165890040835308%016684556527476%25253932539577.9444 912 us - 2225.000651349148434%063966939883077%25251032536332.5280999998 2055 us - 21.605386140618513%
    • 总的来说,使用的人工数据集不太好,其随机生成的取值,编码压缩效率不高,甚至PLAIN编码的空间是最小的

    中车数据实验结果

    RLValueEncodingRealExpScripts.sh

    ZT11529传感器数据如下图所示,共12,780,287个点。

    Image Removed

    25.044002546143567%


    • B类操作耗时超过D类操作耗时,分析原因:使用的人工数据数值是INT64类型的随机取整的数,且PLAIN编码,且四种压缩方式此时的压缩率都不高,所以磁盘数据量偏大。
    • D-1类操作内部7_1_data_ByteBuffer_to_ByteArray占比偏大,分析原因:人工数据压缩率低,解压缩7_2_data_decompress_PageDataByteArray耗时相对少,从而7_1_data_ByteBuffer_to_ByteArray相对占比变大。

    改变值列编码方式

    RLValueEncodingSynExpScripts.sh
    编码方式编码方式GORILLAPLAINRLETS_2DIFFGORILLAPLAINRLETS_2DIFF
    dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csvsyntheticsyntheticsyntheticsynthetic/disk/rl/zc_data/ZT11529.csv
    pagePointNum(ppn)10000100001000010000
    numOfPagesInChunk(pic)1001000100100010010001001000
    chunksWritten(cw)1310131013101310
    timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
    valueDataType(vt)DOUBLEINT64DOUBLEINT64DOUBLEINT64DOUBLEINT64
    valueEncoding(ve)GORILLAPLAINRLETS_2DIFF
    compression(co)SNAPPYUNCOMPRESSEDSNAPPYUNCOMPRESSEDSNAPPYUNCOMPRESSEDSNAPPYUNCOMPRESSED
    totalPointNum12780287100000000127802871000000001278028710000000012780287100000000
    tsfileSize(MB)23805.15641212381289526781.59960175422615122781.10147858842231820793.985507973244705
    chunkDataSize_stats_mean(MB)180.81318434153803624278.08116674414216614178.74007256818412781179.64266101533235168
    compressedPageSize_stats_mean(B)1894884386.692518921758.6166718182.2991781874819188312217160.8925
    uncompressedPageSize_stats_mean(B)2968484386.75752518991463.4891725245.9341781874819188312222210.36417
    timeBufferSize_stats_mean(B)11461.462511461.462511461.4625187218721872187211461.4625
    valueBufferSize_stats_mean(B)1822182512.26833251898000013782.445800448124810746.875
    [2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
    [Avg&Per] (A)get_chunkMetadatas8875391331.03020000001 98490000001 us - 110.21274000219672%8580518676474486%97256100944.88870000001 7581 us - 121.181751942467722%2939556377951902%8450988098.5108 20449999999 us - 90.498496964807579%9671805828234409%8637088231.7961 2157 us - 110.6687579647861%9257222461823116%
    [Avg&Per] (B)load_on_disk_chunk1052155552645.9241 935400001 us - 1352.29260306229143%16637107440095%1382285170158.6108 3812 us - 1766.313597737139155%27343223726832%1398165270914.5536 364100001 us - 1557.714765088895252%866400973957255%968265526099.5282 6186 us - 1357.081335047881257%97985793317826%
    [Avg&Per] (C)get_pageHeader23528185.1583 805399999992 us - 0.2971632563133493%07690455451459878%26257712.5312999999983 402700000001 us - 0.328856613051254%09886107156476356%33887813.6254999999987 90999999998 us - 0.38086658793698225%08578451820695045%31597725.255499999999 986500000003 us - 0.4268177375109232%08106107934716146%
    [Avg&Per] (D_1)decompress_pageData114457548160.26139999992 3352000009 us - 145.460120522641777%149893543905802%182862525441.35910000015 0348000005 us - 226.904124612107367%7353412114264035%155021585036.76800000004 2351000007 us - 176.42376424722015%422783415941812%120684632154.50350000002 6457000006 us - 166.304565026949902%632568914631739%
    [Avg&Per] (D_2)decode_pageData4807594443785.0216999998 968300003 us - 6041.73737315655672%748778959531215%3774081996996.39589999994 816400002 us - 4725.27166909523449%598409841945312%5069783156902.10549999995 088299994 us - 5634.982107111140046%65785050907054%4331473276856.3443000001 417400001 us - 5834.51852422287182%38078982666053%
    [3] D_1 compare each step inside
    [Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)110421.72189999989 us - 64.93156378407389%109687.91739999971 us - 63.92643734398307%113658.35759999987 us - 62.901942587101885%109187.47799999996 us - 63.08420663361515%
    [Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)_decompress_PageDataByteArray(us)54624.26410000002 us - 32.12084386602246%57095.555900000094 us - 33.27545607007074%62164.880499999985 us - 34.403908579311135%59072.25890000005 us - 34.12961499817788%
    [Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)1179.3427000000347 us - 0.6934918640164246%1234.10400000004 us - 0.7192394012210651%1032.115900000022 us - 0.5712038820191121%1193.2719000000652 us - 0.6894253122110825%
    [Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)value_Buffer(us)3833.2921999999994 us - 2.254100485887217%3567.0158000000133 us - 2.0788671847251163%3835.9775000000177 us - 2.122944951567873%3629.1045000000054 us - 2.0967530559958814%
    [3] D_2 compare each step inside
    [Avg&Per] (D-2)8_1_createBatchData(us)6008.9294 us - 0.04720551599260821%6005.094 us - 0.058953749593284185%9136.1988 us - 0.07959246314858166%6219.250599999999 us - 0.053213386787106826%
    [Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)_timeDecoder_hasNext(us)1795067.4479 us - 14.10186066084482%1862631.8749 us - 18.285997377780266%1805100.2815999999 us - 15.725618584694368%1838661.8702 us - 15.732028111177547%
    [Avg&Per] (D-2)8_3_timeDecoder_readLong(us)2073615.2138 us - 16.290102549307967%2089493.8765 us - 20.513167449482335%2063469.3846 us - 17.976470800088325%2172029.5514 us - 18.58440125112089%
    [Avg&Per] (D-2)8_4_valueDecoder_read(us)_read(us)4636195.4124 us - 36.42146247963989%1880348.902 us - 18.459930571697104%3352494.9242 us - 29.206164899804453%3364558.2458 us - 28.787960289219587%
    [Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)1724239.435 us - 13.545443257159627%1784205.2864 us - 17.516060810611705%1723128.802 us - 15.01150190311584%1780807.3196 us - 15.237010821076382%
    [Avg&Per] (D-2)8_6_putIntoBatchData(us)2494168.5891 us - 19.59392553705509%2563425.3348 us - 25.165890040835308%2525393.9444 us - 22.000651349148434%2525103.5280999998 us - 21.605386140618513%