Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

此实验旨在得出 MetadataIndexTree 的 max_degree_of_index_node 配置项最佳参数。

传感器数

设备数

64

128

256(默认)

512

1024

文件数
大小原始查询聚合查询大小原始查询聚合查询大小原始查询聚合查询大小原始查询聚合查询大小原始查询聚合查询

10

1

102

3.131.9

102

2.291.34

102

3.132.69

102

3.41.78

102

2.621.44100

10

1000

54176

17.326.81

53951

16.526.87

53832

16.97.99

53772

16.79.18

53716

15.968.26100

10

10000

551642

34.2620.49

549300

31.7621.25

548095

31.7218.31

547478

33.9520.3

547169

32.2719.26100

10

100000

5617377

118.150.7

5593480

117.350.9

5580851

100.149.3

5574575

109.250.6

5571449

100.346.610

10

1000000

57189153

637162

56941946

498173

56811965

618183

56747338

409129

56715083

4571211

500

1

592

5.13.34

517

5.293.22

479

5.243.37

461

6.773.8

461

5.493.1100

500

1000

567086

53.627.8

491861

48.826.4

453742

43.623.4

435682

46.125.6

435626

42.622.610

500

10000

5680719

177.583.5

4928377

13980

4547172

14179

4366555

14076

4366246

155932

500

20000

11372629

360130

9867943

287142

9105435

306133

8744207

262152

8743579

2521481

10,0000

1

111416

21.3412.03

94548

20.1113.98

86219

19.9212.31

82088

20.6312.3

80040

20.8711.45100

10,0000

100

11138809

323211

9451954

290195

8619054

333192

8205954

251185

8001154

2561601


(

...

VIII) B+ Tree 的最佳配置

Jira
serverASF JIRA
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,customfield_12311032,customfield_12311037,customfield_12311022,customfield_12311027,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,Priority,Priority,Priority,Priority,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyIOTDB-2118

此实验旨在得出 B+ Tree 的 max_degree_of_index_node 配置项最佳参数。


传感器数

设备数

64

128

256

512

1024

文件数
大小原始查询聚合查询大小原始查询聚合查询大小原始查询聚合查询大小原始查询聚合查询大小原始查询聚合查询

10

1

92

2.21.31

92

5.552.25

92

3.581.98

92

3.391.65

92

2.791.98100

10

1000

36328

14.936

36031

16.77.36

35876

15.886.75

35798

16.848.48

35724

18.667.71100

10

10000

373090

26.0815.94

370019

25.0819.35

368463

25.5415.38

367666

23.2216.4

367267

26.6216.94100

10

100000

3831677

73.642.8

3800589

77.945.4

3784396

69.541.5

3776347

58.444.3

3772339

63.540.810

10

1000000

39332027

258194

39012829

289158

38847280

369125

38764968

296118

38723884

1941321

500

1

645

4.563.13

534

5.773.86

478

4.983.1

451

5.062.81

451

8.673.03100

500

1000

629692

38.937.6

507555

45.332.4

446835

46.926.2

417708

52.420.8

417634

35.816.810

500

10000

6377591

14985.5

5115229

153.593.5

4487990

167.580

4186743

14793

4186344

134.575.52

500

20000

12845401

238155

10275117

316148

8998201

229125

8384575

259131

8383767

2641311

10,0000

1

125689

21.1913.5

101630

20.1414.08

89737

23.2615

83833

18.489.93

80903

21.6413.88100

10,0000

100

12710845

340211

10228850

311279

9004533

247219

8399129

277199

8096000

3211581

...

对于泽嵩树和 B+ 树在不同场景和不同配置下,原始数据查询耗时:


时间差
传感器数设备数泽嵩树B+ 树时间差
641282565121024最优配置耗时641282565121024最优配置耗时
1013.132.293.133.42.622.292.25.553.583.392.792.23.93%
10100017.3216.5216.916.715.9615.9614.9316.715.8816.8418.6614.936.45%
101000034.2631.7631.7233.9532.2731.7226.0825.0825.5423.2226.6223.2226.80%
10100000118.1117.3100.1109.2100.3100.173.677.969.558.463.558.441.66%
10100000063749861840945740925828936929619419452.57%
50015.15.295.246.775.495.14.565.774.985.068.674.5610.59%
500100053.648.843.646.142.642.638.945.346.952.435.835.815.96%
50010000177.5139141140155139149153.5167.5147134.5134.53.24%
500200003602873062622522522383162292592642299.13%
100,000121.3420.1119.9220.6320.8719.9221.1920.1423.2618.4821.6418.487.23%
100,0001003232903332512562513403112472773212471.59%


聚合查询耗时:


时间差
传感器数设备数泽嵩树B+ 树时间差
641282565121024最优配置耗时641282565121024最优配置耗时
1011.91.342.691.781.441.341.312.251.981.651.981.312.24%
1010006.816.877.999.188.266.8167.366.758.487.71611.89%
101000020.4921.2518.3120.319.2618.3115.9419.3515.3816.416.9415.3816.00%
1010000050.750.949.350.646.646.642.845.441.544.340.840.812.45%
1010000001621731831291211211941581251181321182.48%
50013.343.223.373.83.13.13.133.863.12.813.032.819.35%
500100027.826.423.425.622.622.637.632.426.220.816.816.825.66%
5001000083.5807976937685.593.5809375.575.50.66%
500200001301421331521481301551481251311311253.85%
100,000112.0313.9812.3112.311.4511.4513.514.08159.9313.889.9313.28%
100,0001002111951921851601602112792191991581581.25%

...

以10传感器、1000000设备场景下的索引文件大小为例,如图所示:

(IX) Hash 结构的 TsFileMetadata 的最佳配置

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyIOTDB-2080

此实验旨在得出  Hash 结构的 TsFileMetadata  的配置项最佳参数。此配置项意为 bucket 中所预估能承装的最多 entry 个数。

注意:此配置项并不意味着 bucket 中实际承装的最多的 entry 个数。

传感器数

设备数

64

128

256

512

1024

文件数
大小原始查询聚合查询大小原始查询聚合查询大小原始查询聚合查询大小原始查询聚合查询大小原始查询聚合查询

10

1

3232.380.83233.241.433232.681.613233.592.093232.881.24100

10

1000

35855421.4913.013544672112.432241520.314.4132105521.4213.8129921620.8313.1100

10

10000

369005743.8436.18385971645.0637.32383117342.6737.7131920124132.97313762240.933.8100

10

100000

4202942751.944.83709425151.134.13981035755.242.83543166449.837.63307416344.936.510

10

1000000

4673253072651314450596822311083601813802471393591161422031173441936932271161

500

1

153706.014.01150308.063.92148627.934.31148049.44.44148047.734.69100

500

1000

2245363854.837.21838146142.728.81734067644.431.41683166047.536.51625875546.93310

500

10000

295537667128932028058381267819689273415582.517505595814377171447421129702

500

20000

6120128062981654717003062601263820957222431403931604172391703489399682591371

10,0000

1

408549644.0135.76336158244.1935.09333460346.7638.2333019340.533.65328462242.9633.44100

10,0000

100

4843565572901394227940572531623807285182371733918517742251893481391572411581


综合(VIII)(IX)两个实验:

对于 B+ 树和 Hash 结构索引在不同场景和不同配置下,原始数据查询耗时:


传感器数设备数Hash 结构B+ 树时间差
641282565121024最优配置耗时641282565121024最优配置耗时
1012.383.242.683.592.882.382.654.32.852.353.142.351.26%
10100021.492120.321.4220.8320.315.3515.3815.6216.417.1715.3524.38%
101000043.8445.0642.674140.940.925.6726.0823.4623.124.1123.143.52%
1010000051.951.155.249.844.944.97865.457.554.959.854.9-22.27%
101000000265231247203227203327361293226252226-11.33%
50016.018.067.939.47.736.014.614.934.564.578.714.5624.13%
500100054.842.744.447.546.942.742.744.743.939.339.739.37.96%
50010000128126155143129126155.5170.5151.5152131.5131.5-4.37%
500200002982602432392592392612722482322332322.93%
100,000144.0144.1946.7640.542.9640.521.0419.4320.6417.4320.917.4356.96%
100,000100290253237225241225339289310303290289-28.44%


聚合查询耗时:


传感器数设备数Hash 结构B+ 树时间差
641282565121024最优配置耗时641282565121024最优配置耗时
1010.81.431.612.091.240.81.122.732.271.7421.12-40.00%
10100013.0112.414.4113.8113.112.46.157.417.787.057.856.1550.40%
101000036.1837.3237.7132.9733.832.9715.8119.4414.9715.417.1614.9754.60%
1010000044.834.142.837.636.534.143.646.74340.444.340.4-18.48%
101000000131108139117116108173153176206162153-41.67%
50014.013.924.314.444.693.923.383.773.192.722.932.7230.61%
500100037.228.831.436.53328.826.325.724.720.319.419.432.64%
50010000937882.577707098881128987.587.5-25.00%
500200001651261401701371261551481251311311250.79%
100,000135.7635.0938.233.6533.4433.4413.1713.0114.9310.6114.3110.6168.27%
100,000100139162173189158139275240149231174149-7.19%


如图所示:

Image Added

Image Added

从查询速度来看,Hash结构的索引在时间序列条数较大的情况下表现较好(超过5000,000条)。这是因为Hash结构的索引在查询过程中,仅需要两次 I/O(第一次获得 buckets 的个数和大小,第二次反序列化对应的 buckets);而 B+ 树索引在时间序列条数较小时,可能仅需要一次 I/O,即使需要两次 I/O,其反序列化大小可能也会小于 bucket 的大小。因此,B+ 树索引在大多数场景下,无论是原始数据查询速度还是聚合查询的速度,都会有优势。

Image Added

从索引文件大小来看,Hash结构的索引区大小远大于B+树索引区文件大小和泽嵩树索引区文件大小。随着传感器数量的增多,此比例会越来越大,如在100000传感器的场景下,索引区文件大小比例可能达到 43:1。


以10传感器-1000设备场景为例,在配置项为1024的情况下:

Hash结构中,需要在索引区存储所有的时间序列名称。每一个 bucket 大小为29140,共有10个bucket(10 × 1000 ÷ 1024向上取整为10),总计约为284.57KB。算上BoomFilter后约为292.2KB。

B+树结构中,由于在叶子节点层就开始做稀疏索引,因此索引区所存储的时间序列并不是全部的时间序列。在这个场景中,由于每个设备有10传感器,因此存储的时间序列数是全部时间序列数的1/10。索引区大小总结约34.89KB。

Hash结构的索引区大小约为B+树结构索引区大小的8.38倍。不算BloomFilter,Hash结构的索引区大小约为B+树结构索引区大小的10.44倍(每一个entry多记录了一个VARINT类型的size)。