Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

一、实验设置

1 实验目的

  1. 读一个TsFile的磁盘IO代价和CPU代价的大小比较
  2. 了解CPU代价里有没有突出的耗时瓶颈操作

2 实验设计

...

分析TsFile文件结构,然后把读取TsFile的耗时步骤进行分解和分类,最后进行实验测量统计。

  • v0.13.1

实验环境

2.1TsFile结构

Image Added



















2.2 TsFile读耗时步骤分解与分类


Image Added



耗时分类耗时分解
(A)get_chunkMetadatas

(A)1_index_read_deserialize_MagicString_FileMetadataSize

(A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter

(A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet

(B)load_on_disk_chunk

(B)4_data_read_deserialize_ChunkHeader

(B)5_data_read_ChunkData

(C)get_pageHeader

(C)6_data_deserialize_PageHeader

(D-1)decompress_pageData(D-1)7_1_data_ByteBuffer_to_ByteArray
(D-1)7_2_data_decompress_PageDataByteArray
(D-1)7_3_data_ByteArray_to_ByteBuffer
(D-1)7_4_data_split_time_value_Buffer
(D-2)decode_pageData

(D-2)8_1_createBatchData

(D-2)8_2_timeDecoder_hasNext
(D-2)8_3_timeDecoder_readLong
(D-2)8_4_valueDecoder_read
(D-2)8_5_checkValueSatisfyOrNot

(D-2)8_6_putIntoBatchData


回答实验目的1:分析A、B、C、D-1、D-2类操作耗时占比,对比B类操作(磁盘IO代价所在)和D类操作(CPU代价主要所在)耗时

回答实验目的2:分析D-1操作内部各子步骤耗时占比、D-2操作内部各子步骤耗时占比

实验结论

  1. 本次实验使用的人工数据集不具有代表性:其时间戳从1开始以步长1递增,值在[0,100)随机取整数,导致时间戳的编码压缩效率过高、值列的编码压缩效率过低。甚至对于值列来说,和其它编码方式相比,采用PLAIN编码的空间大小是最小的。
  2. 在中车ZT11529数据集上的实验结果来看:
    1. 相对其它压缩方法,GZIP的压缩率最高,但(B)磁盘加载IO代价和(D-1)解压缩步骤代价之间有tradeoff,再加上耗时测量的波动影响,导致GZIP压缩下的整体读耗时并不一定是最小的。
    2. 真实数据集的压缩率高,磁盘数据量相对小,此时从磁盘加载Chunk数据的操作耗时小于解压缩和解码Page数据的操作耗时,即整体耗时瓶颈不是磁盘IO。
    3. D-1步骤内部的主要耗时瓶颈是子步骤7_2_data_decompress_PageDataByteArray(us)。人工数据实验里发现另一个子步骤7_1_data_ByteBuffer_to_ByteArray(us)的占比也高,其主要因为人工数据实验里数据压缩率很低,从而子步骤7_2_data_decompress_PageDataByteArray(us)耗时少,从而相对来说7_1_data_ByteBuffer_to_ByteArray(us)占比高了。
    4. 本实验里,D-2类操作内部没有突出的耗时瓶颈子步骤。

实验设置

IoTDB版本

  • v0.13.1

实验环境

  • FIT楼166.111.130.101 / 192.168.130.31
  • CPU:Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz,(6核12线程)
  • L1 cache 284KB
  • FIT楼166.111.130.101 / 192.168.130.31
  • CPU:Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz,(6核12线程)
  • L1 cache 284KB, L2 cache 1536KB, L3 cache 12MB
  • 内存:16G
  • 硬盘:1.8T HDD /dev/sdb1 mounted on /disk
  • 操作系统:Ubuntu 16.04.7 LTS
  • 工作文件夹:/disk/rl/tsfileReadExp/

数据集

...

(1)人工数据:时间戳从1开始以步长1递增,值在[0,100)随机取整数。

(2)中车数据:/disk/rl/zc_data

...

/ZT11529.csv,这条传感器时间序列数据共12,780,287个点,其部分数据作图和csv部分截图如下所示。

Image Added


Image Added

...

实验工具

...

  • WRITE_REAL:用来标识是“写人工数据/写真实数据/读数据”中的“读数据”

  • path_of_tsfile_to_read:要读取的TsFile地址

  • decomposeMeasureTimeFALSE to measure the read process as a whole, in which case D_decompose_each_step is useless. TRUE to measure the decomposed read process, and the decomposition granularity is controlled by D_decompose_each_step.

  • D_decompose_each_step:When decomposeMeasureTime is TRUE, D_decompose_each_step=FALSE to measure the "(D_1)decompress_pageData" and "(D_2)decode_pageData" steps without further deomposition, D_decompose_each_step=TRUE to break down these two steps further and measure substeps inside.

  • timeEncoding(te):If timeEncoding is not specified, TS_2DIFF will be used by default. timeEncoding should be the same with that used to write the TsFile.


控制参数decomposeMeasureTime=FALSE

decomposeMeasureTime=TRUE & D_decompose_each_step=FALSE

用于分析A/B/C/D-1/D-2类操作耗时占比(&实验目标1)

decomposeMeasureTime=TRUE & D_decompose_each_step=TRUE

测量最小单元步骤

用于分析D-1操作内部各子步骤耗时占比、D-2操作内部各子步骤耗时占比(&实验目标2)

测量的最小单元步骤total_time(us)
  • (A)1_index_read_deserialize_MagicString_FileMetadataSize(us)
  • (A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us)
  • (A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us)
  • (B)4_data_read_deserialize_ChunkHeader(us)
  • (B)5_data_read_ChunkData(us)
  • (C)6_data_deserialize_PageHeader(us)
  • (D-1)7_data_decompress_PageData(us)
  • (D-2)8_data_decode_PageData(us)
  • (A)1_index_read_deserialize_MagicString_FileMetadataSize(us)
  • (A)2_index_read_deserialize_IndexRootNode_MetaOffset_BloomFilter(us)
  • (A)3_2_index_read_deserialize_IndexRootNode_exclude_to_TimeseriesMetadata_forExactGet(us)
  • (B)4_data_read_deserialize_ChunkHeader(us)
  • (B)5_data_read_ChunkData(us)
  • (C)6_data_deserialize_PageHeader(us)
  • (D-1)7_1_data_ByteBuffer_to_ByteArray(us)
  • (D-1)7_2_data_decompress_PageDataByteArray(us)
  • (D-1)7_3_data_ByteArray_to_ByteBuffer(us)
  • (D-1)7_4_data_split_time_value_Buffer(us)
  • (D-2)8_1_createBatchData(us)
  • (D-2)8_2_timeDecoder_hasNext(us)
  • (D-2)8_3_timeDecoder_readLong(us)
  • (D-2)8_4_valueDecoder_read(us)
  • (D-2)8_5_checkValueSatisfyOrNot(us)
  • (D-2
)8_6_putIntoBatchData(us)

Image Removed

Image Removed

结合实验目的,

...

  • )8_6_putIntoBatchData(us)


自动化脚本

(1) RLUnitSynExp.sh:用人工数据写TsFile,清空系统缓存,然后进行一次读TsFile实验。

...

  • 输入:

    • 工具地址:

      • WRITE_READ_JAR_PATH:RLTsFileReadCostBench-0.13.1-jar-with-dependencies.jar的地址

      • Calculator_JAR_PATH:把若干次重复读实验结果进行平均值和百分比统计计算的RLRepeatReadResultAvgPercCalculator-0.13.1-jar-with-dependencies.jar的地址

      • TOOL_PATH:用于替换脚本中变量值的自动脚本工具RLtool.sh的地址

      • READ_SCRIPT_PATH:RLReadExpScripts.sh的地址

    • 写数据参数:见RLTsFileReadCostBench写数据参数

    • 读数据参数:见RLTsFileReadCostBench读数据参数

    • REPEAT:读实验重复次数
  • 输出:不同压缩方式下的一个TsFile文件、一个TsFile空间统计结果文件( *writeResult.csv)、REPEAT个读TsFile耗时结果csv文件( *readResult-T*csv)、一个把重复读实验结果横向拼接起来的csv文件(*readResult-combined.csv)、一个把写结果和读结果拼接起来的csv文件(*allResult-combined.csv)、一个把读结果取平均值并且按照不同粒度统计百分比的csv文件( *allResult-combined-processed.csv

其它类似,不再赘述。

二、实验结果

改变压缩方式

人工数据实验结果

RLCompressionSynExpScripts.sh
  • -combined-processed.csv

其它类似,不再赘述。

实验结果

中车数据实验结果

改变压缩方式

RLCompressionRealExpScripts.sh
压缩方式GZIPLZ4SNAPPYUNCOMPRESSEDdatasetsyntheticsyntheticsyntheticsynthetic
压缩方式GZIPLZ4SNAPPYUNCOMPRESSED
dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv
pagePointNum(ppn)10000100001000010000
numOfPagesInChunk(pic)
1000
100
1000
100
1000
100
1000
100
chunksWritten(cw)
10
13
10
13
10
13
10
13
timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
valueDataType(vt)
INT64
DOUBLE
INT64
DOUBLE
INT64
DOUBLE
INT64
DOUBLE
valueEncoding(ve)
PLAIN
GORILLA
PLAIN
GORILLA
PLAIN
GORILLA
PLAIN
GORILLA
compression(co)GZIPLZ4SNAPPYUNCOMPRESSED
totalPointNum
100000000
12780287
100000000
12780287
100000000
12780287
100000000
12780287
tsfileSize(MB)
767
19.
1312866
34862614
770
23.
8444319
77741051
767
23.
9423904
15641212
781
36.
4226151
30773735
chunkDataSize_stats_mean(MB)
76
1.
71300761
515139659
77
1.
08436436
86007007
76
1.
7941264
813184341
78
2.
14216614
837062438
compressedPageSize_stats_mean(B)
80375
15824.
41867
16667
80764
19440.
81444
275
80460
18948.
47789
69
81874
29684.7575
uncompressedPageSize_stats_mean(B)
818748187481874
29684.757529684.757529684.757529684.7575
81874
timeBufferSize_stats_mean(B)
187218721872
11461.462511461.462511461.462511461.4625
1872
valueBufferSize_stats_mean(B)
800008000080000
18221.2683318221.2683318221.2683318221.26833
80000
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
[Avg&Per] (A)get
_chunkMetadatas101654.7566 us - 0.9318259204986696%86735.1136 us - 1.1689451651044902%82918.49919999999 us - 1.0656107165107132%
_chunkMetadatas87316.85010000001 us - 8.19883087518919%100289.6416 us - 12.59103802007131%89466.6045 us - 11.199530364576052%88760.777 us - 10.335699109318087%
[Avg&Per] (B)load_on_disk_chunk160699.6025 us - 15.089285299443361%176784.88239999997 us - 22.19476647997349%105802.9384 us - 13.244531050378352%191518.85280000002 us - 22.301305860612082%
93896.1409 us - 1.2531889263513831%
[Avg&Per] (
B
C)
load_on_disk_chunk5949593.8917000005 us - 54.53739687304131%4849630.197000001 us - 65.35936296194448%5086618.179 us - 65.36966894765757%4851386.484999999 us - 64.74924062031137%
get_pageHeader2436.5129999999995 us - 0.22878239411203669%2198.3668000000007 us - 0.27599779517870476%2319.9517000000005 us - 0.29041416798711567%3134.228800000001 us - 0.3649635223062446%
[Avg&Per] (
C
D_1)
get
decompress_
pageHeader
pageData
6613
356587.
381399999991
5454999999 us -
0
33.
060622054656159316%
482666568996265%
7120
31158.
158900000014
160799999983 us -
0
3.
09595969816001626%
911805656191469%
7692
115629.
346500000014
7179 us -
0
14.
09885667184764595%
474658381255693%
7605
29640.
696300000007
983400000016 us -
0
3.
10150975630087579%
4515277590088287%
[Avg&Per] (D_
1
2)
decompress
decode_pageData
2859428.0031000106 us - 26.211160404158996%521804.2723000004 us - 7.032452670194604%605202.8143000009 us - 7.777644443672276%
457950.96670000016 us - 43.000434862259155%486085.0215000002 us - 61.02639204858503%485623.25309999986 us - 60.790866035802786%545723.8052999998 us - 63.54650374875476%
SUM1064991.4778 us796516.0731000002 us798842.4655999999 us858778.6472999998 us
[3] D_1 compare each step inside
498170.42259999976 us - 6.648853201570805%
[Avg&Per] (D
_2)decode_pageData1991910.319299996 us - 18.25899474764487%1954657.4198999994 us - 26.343279504596413%1998880.5966999987 us - 25.68821922031179%2041517.9070999995 us - 27.247207495465567%
-1)7_1_data_ByteBuffer_to_ByteArray(us)1269.7615999999996 us - 0.36377892330196115%1808.5776999999991 us - 7.207505165727991%1687.9916000000003 us - 1.412377327892232%3197.6313 us - 41.86667970732365%
[3] D_1 compare each step inside
[Avg&Per] (D-1)7_
1
2_data_
ByteBuffer
decompress_
to_ByteArray
PageDataByteArray(us)
65952
345856.
37819999998
23130000004 us -
2
99.
5136549346918415%
08569249502276%
108809
21247.
34350000018
39440000001 us -
59
84.
6896105506002%
67466169480036%
108132
116100.
35939999981
69619999993 us -
43
97.
622622294156905%
14384305311928%
110765
2432.
11740000003
4552000000012 us -
63
31.
813731447511664%
84820675254647%
[Avg&Per] (D-1)7_
2
3_data_
decompress
ByteArray_to_
PageDataByteArray
ByteBuffer(us)
2554687
374.
926599999
18470000000025 us -
97
0.
36728361519128%
10720162531460037%
68904
442.
38600000006
10720000000003 us -
37
1.
79892271446546%
761876157051776%
135345
424.
91170000008
26030000000026 us -
54
0.
601079805416944%
35498732863644405%
57547
421.
215800000035
1930000000002 us -
33
5.
15396273496119%
51469221169019%
[Avg&Per] (D-1)7_
3
4_data_
ByteArray
split_time_
to
value_
ByteBuffer
Buffer(us)
811
1547.
8335000000624
4220999999993 us - 0.
03094155721322126%
443326956360674%
1239
1594.
949800000022
8989000000004 us -
0
6.
680200047933345%
355956982419887%
1184
1301.
7460000000272
2614999999998 us -
0
1.
47794876167766753%
0887922903520593%
1229
1586.
1439000000355
372500000001 us -
0.7081314098345313%
20.770421328439692%
[3] D_2 compare each step inside
[Avg&Per] (D-
1
2)
7
8_
4_data_split_time_value_Buffer
1_createBatchData(us)
2312
3430.
0582000000422
8672 us - 0.
08811989290364733%
22855030377902266%
3338
4174.
2513999999974
8958 us -
1
0.
8312666870009688%
275155541260775%
3218
3438.
3657999999955
1465 us -
1
0.
29834913874849%
22874736939899784%
4034
3730.
201500000018
6205 us -
2.3241744076926314%[3] D_2 compare each step inside
0.24721094086707432%
[Avg&Per] (D-2)8_
1
2_timeDecoder_
createBatchData
hasNext(us)
5384
234016.
7852
77980000002 us -
0
15.
053292019060348375%
589238228946503%
5848
236135.
7599
9462 us -
0
15.
05759123169122766%
56305048087338%
5913
235278.
4963
4135 us -
0
15.
058362326692940975%
65358490817499%
6019
234469.
3023
6457 us -
0
15.
05943520403215091%
537217392727715%
[Avg&Per] (D-2)8_
2
3_timeDecoder_
hasNext
readLong(us)
1859842
357893.
2956
9434 us -
18
23.
406444711361424%
841426880277478%
1862234
360253.
7849
1425 us -
18
23.
336946086748988%
743262865502565%
1864092
358341.
3926
1524 us -
18
23.
397368271414525%
84121675993312%
1857778
363063.
6739
0738 us -
18
24.
343895858133802%
058508247673558%
[Avg&Per] (D-2)8_
3
4_
timeDecoder
valueDecoder_
readLong
read(us)
2074757
353821.
7936
6809 us -
20
23.
533415498567617%
570149451806063%
2084700
359477.
4377
8899 us -
20
23.
527508047369906%
692168165422427%
2063043
356440.
8916
4841 us -
20
23.
360888969091857%
714761161335133%
2069930
355773.
4964
1939 us -
20
23.
43870456313607%
5754416723178%
[Avg&Per] (D-2)8_
4
5_
valueDecoder_read
checkValueSatisfyOrNot(us)
1876012
223758.
952
3096 us -
18
14.
56648209392724%
905861011513531%
1881471
224938.
5433999998
2721 us -
18
14.
526365490982297%
825043539994219%
1877809
224282.
2412
1638 us -
18
14.
532744562964893%
921980483485841%
1876843
226053.
1276
6587 us -
18
14.
53214021585961%
979528915812129%
[Avg&Per] (D-2)8_
5
6_
checkValueSatisfyOrNot
putIntoBatchData(us)
1780379
328221.
6374
5562 us -
17
21.
620020492363725%
864774123677403%
1780782
332305.
3133
597 us -
17
21.
534904586680103%
90131940694663%
1781949
325251.
2049
7878 us -
17
21.
586668929952697%
639709317671908%
1780599
325993.
5789
7043 us -
17.5818216126948%[Avg&Per] (D-2)8_6_putIntoBatchData(us)2507922.0072 us - 24.82034518471963%2540605.1784 us - 25.016684556527476%2539577.912 us - 25.063966939883077%2536332.2055 us - 25.044002546143567%

B类操作耗时超过D类操作耗时。分析原因:使用的人工数据数值是INT64类型的随机取整的数,且PLAIN编码,且四种压缩方式此时的压缩率都不高,所以磁盘数据量偏大。

中车数据实验结果

RLCompressionRealExpScripts.sh

ZT11529传感器数据如下图所示,共12,780,287个点。

Image Removed

21.602092830601723%


  • 相对其它压缩方法,GZIP的压缩率最高,同时它的D-1解压缩步骤耗时占比也更高。
  • 真实数据集的压缩率高,磁盘数据量少,D类操作耗时超过B类操作耗时,即整体耗时瓶颈在D类操作。
  • 真实数据集的压缩率高,D-1步骤内部的主要耗时瓶颈是子步骤7_2_data_decompress_PageDataByteArray(us)。
    • 人工数据实验里发现另一个子步骤7_1_data_ByteBuffer_to_ByteArray(us)的占比也高,其主要因为人工数据实验里数据压缩率很低,从而子步骤7_2_data_decompress_PageDataByteArray(us)耗时少,从而相对来说7_1_data_ByteBuffer_to_ByteArray(us)占比高了。
  • 本实验里,D-2类操作内部没有突出的耗时瓶颈子步骤。


后续:可以增大真实数据集的数据量之后再实验看看。

改变编码方式

RLValueEncodingRealExpScripts.sh
编码方式GORILLAPLAINRLETS_2DIFF
压缩方式GZIPLZ4SNAPPYUNCOMPRESSED
dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv
pagePointNum(ppn)10000100001000010000
numOfPagesInChunk(pic)100100100100
chunksWritten(cw)13131313
timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
valueDataType(vt)DOUBLEDOUBLEDOUBLEDOUBLE
valueEncoding(ve)GORILLA
GORILLA
PLAIN
GORILLA
RLE
GORILLA
TS_2DIFF
compression(co)
GZIP
SNAPPY
LZ4
SNAPPYSNAPPY
UNCOMPRESSED
SNAPPY
totalPointNum12780287127802871278028712780287
tsfileSize(MB)
19
23.
34862614
15641212
23
26.
77741051
59960175
23
22.
15641212
10147858
36
20.
30773735
98550797
chunkDataSize_stats_mean(MB)1.
515139659
813184341
1
2.
86007007
0811667441.
813184341
740072568
2
1.
837062438
642661015
compressedPageSize_stats_mean(B)
15824
18948.
16667
69
19440
21758.
275
61667
18948
18182.
69
29917
29684
17160.
7575
8925
uncompressedPageSize_stats_mean(B)29684.7575
29684
91463.
7575
48917
29684
25245.
7575
93417
29684
22210.
7575
36417
timeBufferSize_stats_mean(B)11461.462511461.462511461.462511461.4625
valueBufferSize_stats_mean(B)18221.26833
18221.26833
80000
18221
13782.
26833
445
18221
10746.
26833
875
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load
in-memory PageData[Avg&Per] (A)get_chunkMetadatas87316.85010000001 us - 8.19883087518919%100289.6416 us - 12.59103802007131%89466.6045 us - 11.199530364576052%88760.777 us - 10.335699109318087%
in-memory PageData
[Avg&Per] (
B
A)
load_on_disk_chunk160699.6025 us - 15.089285299443361%176784.88239999997 us - 22.19476647997349%105802.9384 us - 13.244531050378352%
get_chunkMetadatas88753.03020000001 us - 11.21274000219672%97256.88870000001 us - 12.181751942467722%84509.5108 us - 9.498496964807579%86370.7961 us - 11.6687579647861%
191518.85280000002 us - 22.301305860612082%
[Avg&Per] (
C)get_pageHeader2436.5129999999995 us - 0.22878239411203669%2198.3668000000007 us - 0.27599779517870476%2319.9517000000005 us - 0.29041416798711567%
B)load_on_disk_chunk105215.9241 us - 13.29260306229143%138228.6108 us - 17.313597737139155%139816.5536 us - 15.714765088895252%96826.5282 us - 13.081335047881257%
3134.228800000001 us - 0.3649635223062446%
[Avg&Per] (
D_1
C)
decompress
get_
pageData
pageHeader
356587
2352.
5454999999
1583 us -
33
0.
482666568996265%
2971632563133493%
31158
2625.
160799999983
5312999999983 us -
3
0.
911805656191469%
328856613051254%
115629
3388.
7179
6254999999987 us -
14
0.
474658381255693%
38086658793698225%
29640
3159.
983400000016
255499999999 us -
3
0.
4515277590088287%
4268177375109232%
[Avg&Per] (D_
2
1)
decode
decompress_pageData
457950
114457.
96670000016
26139999992 us -
43
14.
000434862259155%
460120522641777%
486085
182862.
0215000002
35910000015 us -
61
22.
02639204858503%
904124612107367%
485623
155021.
25309999986
76800000004 us -
60
17.
790866035802786%
42376424722015%
545723
120684.
8052999998
50350000002 us -
63.54650374875476%[3] D_1 compare each step inside
16.304565026949902%
[Avg&Per] (D
-1
_2)
7_1_data_ByteBuffer_to_ByteArray(us)
decode_pageData480759.0216999998 us - 60.73737315655672%377408.39589999994 us - 47.27166909523449%506978.10549999995 us - 56.982107111140046%433147.3443000001 us - 58.51852422287182%
SUM791537.3956999998 us798381.7858000002 us889714.5634 us740188.4276000002
[3] D_1 compare each step inside
1269.7615999999996 us - 0.36377892330196115%1808.5776999999991 us - 7.207505165727991%1687.9916000000003 us - 1.412377327892232%3197.6313 us - 41.86667970732365%
[Avg&Per] (D-1)7_
2
1_data_ByteBuffer_
decompress
to_
PageDataByteArray
ByteArray(us)
345856
1754.
23130000004
3673999999999 us -
99
1.
08569249502276%
6860113426598728%
21247
2320.
39440000001
3632000000002 us -
84
1.
67466169480036%
489795195953362%
116100
1825.
69619999993
1960999999988 us -
97
1.
14384305311928%
548613686329812%
2432
1607.
4552000000012
3271999999997 us -
31
1.
84820675254647%
758246074195737%
[Avg&Per] (D-1)7_
3
2_data_
ByteArray
decompress_
to_ByteBuffer
PageDataByteArray(us)
374
100527.
18470000000025
56879999992 us -
0
96.
10720162531460037%
61067644486589%
442
151532.
10720000000003
73299999998 us -
1
97.
761876157051776%
29198327791244%
424
113686.
26030000000026
73430000001 us -
0
96.
35498732863644405%
4591326329927%
421
87212.
1930000000002
82 us -
5
95.
51469221169019%
4016073295714%
[Avg&Per] (D-1)7_
4
3_data_
split
ByteArray_
time
to_
value_Buffer
ByteBuffer(us)
1547
431.
4220999999993
9958999999999 us - 0.
443326956360674%
41516388607230165%
1594
468.
8989000000004
2119000000003 us -
6
0.
355956982419887%
30061666178303303%
1301
474.
2614999999998
28859999999975 us -
1
0.
0887922903520593%
4024169332984032%
1586
663.
372500000001
3730000000002 us -
20.770421328439692%[3] D_2 compare each step inside
0.7256599483773118%
[Avg&Per] (D-
2)8_1_createBatchData
1)7_4_data_split_time_value_Buffer(us)
3430
1340.
8672
373799999999 us -
0
1.
22855030377902266%
288148326401935%
4174
1429.
8958
1740000000002 us - 0.
275155541260775%
9176048643511714%
3438
1873.
1465
7816000000007 us -
0
1.
22874736939899784%
5898367473790769%
3730
1932.
6205
989900000001 us -
0.24721094086707432%
2.1144866478555286%
[3] D_2 compare each step inside
[Avg&Per] (D-2)8_
2
1_
timeDecoder_hasNext
createBatchData(us)
234016
3348.
77980000002
9259 us -
15
0.
589238228946503%
2218271110091239%
236135
3442.
9462
3069 us -
15
0.
56305048087338%
24318767981262365%
235278
3466.
4135
6887 us -
15
0.
65358490817499%
22277034162258696%
234469
3338.
6457
6453 us -
15
0.
537217392727715%
22194982158403212%
[Avg&Per] (D-2)8_
3
2_timeDecoder_
readLong
hasNext(us)
357893
235317.
9434
741 us -
23
15.
841426880277478%
587043790733997%
360253
244839.
1425
8323 us -
23
17.
743262865502565%
29713023052909%
358341
239308.
1524
8755 us -
23
15.
84121675993312%
37805224577913%
363063
240223.
0738
005 us -
24
15.
058508247673558%
969786637750962%
[Avg&Per] (D-2)8_
4
3_
valueDecoder
timeDecoder_
read
readLong(us)
353821
361562.
6809
5392 us - 23.
570149451806063%
949282819264262%
359477
363364.
8899
4154 us -
23
25.
692168165422427%
67050285597614%
356440
362143.
4841
5815 us - 23.
714761161335133%
271443255687238%
355773
357833.
1939
1113 us - 23.
5754416723178%
788389623148674%
[Avg&Per] (D-2)8_
5
4_valueDecoder_
checkValueSatisfyOrNot
read(us)
223758
356325.
3096
4946 us -
14
23.
905861011513531%
602389962111484%
224938
241625.
2721
5526 us -
14
17.
825043539994219%
07005192367858%
224282
396438.
1638
5566 us -
14
25.
921980483485841%
475247513902037%
226053
335421.
6587
7093 us -
14
22.
979528915812129%
298501890735768%
[Avg&Per] (D-2)8_
6
5_
putIntoBatchData
checkValueSatisfyOrNot(us)
328221
225554.
5562
2633 us -
21
14.
864774123677403%
940327764084076%
332305
232364.
597
86419999998 us -
21
16.
90131940694663%
415814695306036%
325251
224510.
7878
63450000001 us -
21
14.
639709317671908%
4271133273284%
325993
230332.
7043
4743 us -
21
15.
602092830601723%

...

312273986066641%
[Avg&Per] (D-2)8_6_putIntoBatchData(us)

...

  • 人工数据实验里发现另一个子步骤7_1_data_ByteBuffer_to_ByteArray(us)的占比也高,其主要因为人工数据实验里数据压缩率很低,从而子步骤7_2_data_decompress_PageDataByteArray(us)耗时少,从而相对来说7_1_data_ByteBuffer_to_ByteArray(us)占比高了。
327591.9399 us - 21.699128552797042%329856.9807 us - 23.30331261469754%330303.223 us - 21.225373315680617%337085.3345 us - 22.409098040713936%


人工数据实验结果

改变压缩方式

RLCompressionSynExpScripts.sh
压缩方式GZIPLZ4SNAPPYUNCOMPRESSED

...

后续:可以增大真实数据集的数据量之后再实验看看。

改变编码方式

人工数据实验结果

RLValueEncodingSynExpScripts.sh
编码方式GORILLAPLAINRLETS_2DIFF
datasetsyntheticsyntheticsyntheticsynthetic
pagePointNum(ppn)10000100001000010000
numOfPagesInChunk(pic)1000100010001000
chunksWritten(cw)10101010
timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
valueDataType(vt)INT64INT64INT64INT64
valueEncoding(ve)
GORILLA
PLAINPLAIN
RLE
PLAIN
TS_2DIFF
PLAIN
compression(co)
UNCOMPRESSED
GZIP
UNCOMPRESSED
LZ4
UNCOMPRESSED
SNAPPYUNCOMPRESSED
totalPointNum100000000100000000100000000100000000
tsfileSize(MB)
805
767.
3812895
1312866
781
770.
4226151
8444319
781
767.
8422318
9423904
793
781.
3244705
4226151
chunkDataSize_stats_mean(MB)
80
76.
53803624
71300761
78
77.
14216614
08436436
78
76.
18412781
7941264
79
78.
33235168
14216614
compressedPageSize_stats_mean(
B)84386.25189818748191883122
B)80375.4186780764.8144480460.4778981874
uncompressedPageSize_stats_mean(B)
84386.25189
8187481874
81918
81874
83122
81874
timeBufferSize_stats_mean(B)1872187218721872
valueBufferSize_stats_mean(B)
82512.25189
8000080000
80044
80000
81248
80000
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
[Avg&Per] (A)get_chunkMetadatas
91331
101654.
98490000001
7566 us - 0.
8580518676474486%
9318259204986696%
100944
86735.
7581
1136 us - 1.
2939556377951902%
1689451651044902%
88098
82918.
20449999999
49919999999 us -
0
1.
9671805828234409%
0656107165107132%
88231
93896.
2157
1409 us -
0
1.
9257222461823116%
2531889263513831%
[Avg&Per] (B)load_on_disk_chunk
5552645
5949593.
935400001
8917000005 us -
52
54.
16637107440095%
53739687304131%
5170158
4849630.
3812
197000001 us -
66
65.
27343223726832%
35936296194448%
5270914
5086618.
364100001
179 us -
57
65.
866400973957255%
36966894765757%
5526099
4851386.
6186
484999999 us -
57
64.
97985793317826%
74924062031137%
[Avg&Per] (C)get_pageHeader
8185
6613.
805399999992
381399999991 us - 0.
07690455451459878%
060622054656159316%
7712
7120.
402700000001
158900000014 us - 0.
09886107156476356%
09595969816001626%
7813
7692.
90999999998
346500000014 us - 0.
08578451820695045%
09885667184764595%
7725
7605.
986500000003
696300000007 us - 0.
08106107934716146%
10150975630087579%
[Avg&Per] (D_1)decompress_pageData
548160
2859428.
3352000009
0031000106 us -
5
26.
149893543905802%
211160404158996%
525441
521804.
0348000005
2723000004 us -
6
7.
7353412114264035%
032452670194604%
585036
605202.
2351000007
8143000009 us -
6
7.
422783415941812%
777644443672276%
632154
498170.
6457000006
42259999976 us - 6.
632568914631739%
648853201570805%
[Avg&Per] (D_2)decode_pageData
4443785
1991910.
968300003
319299996 us -
41
18.
748778959531215%
25899474764487%
1996996
1954657.
816400002
4198999994 us -
25
26.
598409841945312%
343279504596413%
3156902
1998880.
088299994
5966999987 us -
34
25.
65785050907054%
68821922031179%
3276856
2041517.
417400001
9070999995 us -
34
27.
38078982666053%
247207495465567%
[3] D_1 compare each step inside
[Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)
110421
65952.
72189999989
37819999998 us -
64
2.
93156378407389%
5136549346918415%
109687
108809.
91739999971
34350000018 us -
63
59.
92643734398307%
6896105506002%
113658
108132.
35759999987
35939999981 us -
62
43.
901942587101885%
622622294156905%
109187
110765.
47799999996
11740000003 us - 63.
08420663361515%
813731447511664%
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)
54624
2554687.
26410000002
926599999 us -
32
97.
12084386602246%
36728361519128%
57095
68904.
555900000094
38600000006 us -
33
37.
27545607007074%
79892271446546%
62164
135345.
880499999985
91170000008 us -
34
54.
403908579311135%
601079805416944%
59072
57547.
25890000005
215800000035 us -
34
33.
12961499817788%
15396273496119%
[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)
1179
811.
3427000000347
8335000000624 us - 0.
6934918640164246%
03094155721322126%
1234
1239.
10400000004
949800000022 us - 0.
7192394012210651%
680200047933345%
1032
1184.
115900000022
7460000000272 us - 0.
5712038820191121%
47794876167766753%
1193
1229.
2719000000652
1439000000355 us - 0.
6894253122110825%
7081314098345313%
[Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)
3833
2312.
2921999999994
0582000000422 us -
2
0.
254100485887217%
08811989290364733%
3567
3338.
0158000000133
2513999999974 us -
2
1.
0788671847251163%
8312666870009688%
3835
3218.
9775000000177
3657999999955 us -
2
1.
122944951567873%
29834913874849%
3629
4034.
1045000000054
201500000018 us - 2.
0967530559958814%
3241744076926314%
[3] D_2 compare each step inside
[Avg&Per] (D-2)8_1_createBatchData(us)
6008
5384.
9294
7852 us - 0.
04720551599260821%
053292019060348375%
6005
5848.
094
7599 us - 0.
058953749593284185%
05759123169122766%
9136
5913.
1988
4963 us - 0.
07959246314858166%
058362326692940975%
6219
6019.
250599999999
3023 us - 0.
053213386787106826%
05943520403215091%
[Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)
1795067
1859842.
4479
2956 us -
14
18.
10186066084482%
406444711361424%
1862631
1862234.
8749
7849 us - 18.
285997377780266%
336946086748988%
1805100
1864092.
2815999999
3926 us -
15
18.
725618584694368%
397368271414525%
1838661
1857778.
8702
6739 us -
15
18.
732028111177547%
343895858133802%
[Avg&Per] (D-2)8_3_timeDecoder_readLong(us)
2073615
2074757.
2138
7936 us -
16
20.
290102549307967%
533415498567617%
2089493
2084700.
8765
4377 us - 20.
513167449482335%
527508047369906%
2063469
2063043.
3846
8916 us -
17
20.
976470800088325%
360888969091857%
2172029
2069930.
5514
4964 us -
18
20.
58440125112089%
43870456313607%
[Avg&Per] (D-2)8_4_valueDecoder_read(us)
4636195
1876012.
4124
952 us -
36
18.
42146247963989%
56648209392724%
1880348
1881471.
902
5433999998 us - 18.
459930571697104%
526365490982297%
3352494
1877809.
9242
2412 us -
29
18.
206164899804453%
532744562964893%
3364558
1876843.
2458
1276 us -
28
18.
787960289219587%
53214021585961%
[Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)
1724239
1780379.
435
6374 us -
13
17.
545443257159627%
620020492363725%
1784205
1780782.
2864
3133 us - 17.
516060810611705%
534904586680103%
1723128
1781949.
802
2049 us -
15
17.
01150190311584%
586668929952697%
1780807
1780599.
3196
5789 us -
15
17.
237010821076382%
5818216126948%
[Avg&Per] (D-2)8_6_putIntoBatchData(us)
2494168
2507922.
5891
0072 us -
19
24.
59392553705509%
82034518471963%
2563425
2540605.
3348
1784 us - 25.
165890040835308%
016684556527476%
2525393
2539577.
9444
912 us -
22
25.
000651349148434%
063966939883077%
2525103
2536332.
5280999998
2055 us -
21
25.
605386140618513%
044002546143567%

...


B类操作耗时超过D类操作耗时。分析原因:使用的人工数据数值是INT64类型的随机取整的数,且PLAIN编码,且四种压缩方式此时的压缩率都不高,所以磁盘数据量偏大。

...

改变编码方式


RLValueEncodingRealExpScriptsRLValueEncodingSynExpScripts.sh

ZT11529传感器数据如下图所示,共12,780,287个点。

...

编码方式GORILLAPLAINRLETS_2DIFFGORILLAPLAINRLETS_2DIFF
dataset/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csv/disk/rl/zc_data/ZT11529.csvsyntheticsyntheticsyntheticsynthetic/disk/rl/zc_data/ZT11529.csv
pagePointNum(ppn)10000100001000010000
numOfPagesInChunk(pic)1001000100100010010001001000
chunksWritten(cw)1310131013101310
timeEncoding(te)TS_2DIFFTS_2DIFFTS_2DIFFTS_2DIFF
valueDataType(vt)DOUBLEINT64DOUBLEINT64DOUBLEINT64DOUBLEINT64
valueEncoding(ve)GORILLAPLAINRLETS_2DIFF
compression(co)SNAPPYUNCOMPRESSEDSNAPPYUNCOMPRESSEDSNAPPYUNCOMPRESSEDSNAPPYUNCOMPRESSED
totalPointNum12780287100000000127802871000000001278028710000000012780287100000000
tsfileSize(MB)23805.15641212381289526781.59960175422615122781.10147858842231820793.985507973244705
chunkDataSize_stats_mean(MB)180.81318434153803624278.08116674414216614178.74007256818412781179.64266101533235168
compressedPageSize_stats_mean(B)1894884386.692518921758.6166718182.2991781874819188312217160.8925
uncompressedPageSize_stats_mean(B)2968484386.75752518991463.4891725245.9341781874819188312222210.36417
timeBufferSize_stats_mean(B)11461.462511461.462511461.4625187218721872187211461.4625
valueBufferSize_stats_mean(B)1822182512.26833251898000013782.445800448124810746.875
[2] category: (A)get ChunkStatistic->(B)load on-disk Chunk->(C)get PageStatistics->(D)load in-memory PageData
[Avg&Per] (A)get_chunkMetadatas8875391331.03020000001 98490000001 us - 110.21274000219672%8580518676474486%97256100944.88870000001 7581 us - 121.181751942467722%2939556377951902%8450988098.5108 20449999999 us - 90.498496964807579%9671805828234409%8637088231.7961 2157 us - 110.6687579647861%9257222461823116%
[Avg&Per] (B)load_on_disk_chunk1052155552645.9241 935400001 us - 1352.29260306229143%16637107440095%1382285170158.6108 3812 us - 1766.313597737139155%27343223726832%1398165270914.5536 364100001 us - 1557.714765088895252%866400973957255%968265526099.5282 6186 us - 1357.081335047881257%97985793317826%
[Avg&Per] (C)get_pageHeader23528185.1583 805399999992 us - 0.2971632563133493%07690455451459878%26257712.5312999999983 402700000001 us - 0.328856613051254%09886107156476356%33887813.6254999999987 90999999998 us - 0.38086658793698225%08578451820695045%31597725.255499999999 986500000003 us - 0.4268177375109232%08106107934716146%
[Avg&Per] (D_1)decompress_pageData114457548160.26139999992 3352000009 us - 145.460120522641777%149893543905802%182862525441.35910000015 0348000005 us - 226.904124612107367%7353412114264035%155021585036.76800000004 2351000007 us - 176.42376424722015%422783415941812%120684632154.50350000002 6457000006 us - 166.304565026949902%632568914631739%
[Avg&Per] (D_2)decode_pageData4807594443785.0216999998 968300003 us - 6041.73737315655672%748778959531215%3774081996996.39589999994 816400002 us - 4725.27166909523449%598409841945312%5069783156902.10549999995 088299994 us - 5634.982107111140046%65785050907054%4331473276856.3443000001 417400001 us - 5834.51852422287182%38078982666053%
[3] D_1 compare each step inside
[Avg&Per] (D-1)7_1_data_ByteBuffer_to_ByteArray(us)1754110421.3673999999999 72189999989 us - 164.6860113426598728%93156378407389%2320109687.3632000000002 91739999971 us - 163.489795195953362%92643734398307%1825113658.1960999999988 35759999987 us - 162.548613686329812%901942587101885%1607109187.3271999999997 47799999996 us - 163.758246074195737%08420663361515%
[Avg&Per] (D-1)7_2_data_decompress_PageDataByteArray(us)10052754624.56879999992 26410000002 us - 9632.61067644486589%12084386602246%15153257095.73299999998 555900000094 us - 9733.29198327791244%27545607007074%11368662164.73430000001 880499999985 us - 9634.4591326329927%403908579311135%8721259072.82 25890000005 us - 9534.4016073295714%12961499817788%
[Avg&Per] (D-1)7_3_data_ByteArray_to_ByteBuffer(us)4311179.9958999999999 3427000000347 us - 0.41516388607230165%6934918640164246%4681234.2119000000003 10400000004 us - 0.30061666178303303%7192394012210651%4741032.28859999999975 115900000022 us - 0.4024169332984032%5712038820191121%6631193.3730000000002 2719000000652 us - 0.7256599483773118%6894253122110825%
[Avg&Per] (D-1)7_4_data_split_time_value_Buffer(us)13403833.373799999999 2921999999994 us - 12.288148326401935%254100485887217%14293567.1740000000002 0158000000133 us - 02.9176048643511714%0788671847251163%18733835.7816000000007 9775000000177 us - 12.5898367473790769%122944951567873%19323629.989900000001 1045000000054 us - 2.1144866478555286%0967530559958814%
[3] D_2 compare each step inside
[Avg&Per] (D-2)8_1_createBatchData(us)33486008.9259 9294 us - 0.2218271110091239%04720551599260821%34426005.3069 094 us - 0.24318767981262365%058953749593284185%34669136.6887 1988 us - 0.22277034162258696%07959246314858166%33386219.6453 250599999999 us - 0.22194982158403212%053213386787106826%
[Avg&Per] (D-2)8_2_timeDecoder_hasNext(us)2353171795067.741 4479 us - 1514.587043790733997%10186066084482%2448391862631.8323 8749 us - 1718.29713023052909%285997377780266%2393081805100.8755 2815999999 us - 15.37805224577913%725618584694368%2402231838661.005 8702 us - 15.969786637750962%732028111177547%
[Avg&Per] (D-2)8_3_timeDecoder_readLong(us)3615622073615.5392 2138 us - 2316.949282819264262%290102549307967%3633642089493.4154 8765 us - 2520.67050285597614%513167449482335%3621432063469.5815 3846 us - 2317.271443255687238%976470800088325%3578332172029.1113 5514 us - 2318.788389623148674%58440125112089%
[Avg&Per] (D-2)8_4_valueDecoder_read(us)3563254636195.4946 4124 us - 2336.602389962111484%42146247963989%2416251880348.5526 902 us - 1718.07005192367858%459930571697104%3964383352494.5566 9242 us - 2529.475247513902037%206164899804453%3354213364558.7093 2458 us - 2228.298501890735768%787960289219587%
[Avg&Per] (D-2)8_5_checkValueSatisfyOrNot(us)2255541724239.2633 435 us - 1413.940327764084076%545443257159627%2323641784205.86419999998 2864 us - 1617.415814695306036%516060810611705%2245101723128.63450000001 802 us - 1415.4271133273284%01150190311584%2303321780807.4743 3196 us - 15.312273986066641%237010821076382%
[Avg&Per] (D-2)8_6_putIntoBatchData(us)3275912494168.9399 5891 us - 2119.699128552797042%59392553705509%3298562563425.9807 3348 us - 2325.30331261469754%165890040835308%3303032525393.223 9444 us - 2122.225373315680617%000651349148434%3370852525103.3345 5280999998 us - 2221.409098040713936%605386140618513%


  • 总的来说,使用的人工数据集不太好,其随机生成的取值,编码压缩效率不高,甚至PLAIN编码的空间是最小的