Apache Kylin : Analytical Data Warehouse for Big Data
Page History
Table of Contents |
---|
...
01 背景
1. 概述
数据存在很少的部分结果的更新,如果刷新 segment,耗时通常在数分钟左右,用户等不及;希望在数据更新后几秒内,Kylin就可以在查询结果中反映到数据变更(无需刷新 segment)。
...
order_id | buyer_id | part_dt | price | count |
order_001 | zhang | 2021-01-01 | -200 | -1 |
order_001 | zhang | 2021-01-02 | +200 | +1 |
...
02 开发方案 Merge on Read
1. 思路
在每次查询的时候,在 TableScan 算子执行过程中,将 delta 数据文件按照击中的 Cuboid 进行 aggregation,然后和 Cuboid Dataframe 进行合并。
其中合并 Delta Cuboid 和 Original Cuboid 的处理,根据记录更新类型的不同,可能稍微有点复杂。比方删除记录和查询记录的行为得分开处理。
2. 流程图
...
03 代码实现及测试结果
1. 代码实现
代码分支:https://github.com/zhangayqian/kylin/tree/delta-csv-datasource
...
| Query | kylin.query.spark-conf.spark.executor.cores=3 | kylin.query.spark-conf.spark.executor.cores=1 | ||||
1并发 | 10并发 | 20并发 | 1并发 | 10并发 | 20并发/err | ||
0 | Q1.1 | 178.69 | 475.86 | 834.46 | 631.82 | 2826.32 | 12850.18/3.95% |
Q1.2 | 142.40 | 299.24 | 592.78 | 790.59 | 2229.30 | 4343.96/2.05% | |
Q1.3 | 147.73 | 305.32 | 593.30 | 437.71 | 2115.67 | 2794.34/0.70% | |
Q2.1 | 176.78 | 376.71 | 727.96 | 441.27 | 2220.17 | 2304.61/1.41% | |
Q2.2 | 172.78 | 364.76 | 716.69 | 441.03 | 1893.13 | 2119.11 | |
Q2.3 | 171.09 | 358.08 | 719.93 | 476.31 | 1827.40 | 2105.86 | |
Q3.1 | 170.63 | 351.84 | 720.70 | 465.03 | 1980.37 | 2363.31 | |
Q3.2 | 230.57 | 443.34 | 787.17 | 552.73 | 2239.47 | 2997.98 | |
Q3.3 | 226.74 | 428.90 | 770.51 | 506.64 | 2360.32 | 4290.75 | |
Q3.4 | 742.20 | 1168.66 | 1849.99 | 971.21 | 3059.67 | 6440.36 | |
Q4.1 | 203.03 | 386.46 | 769.75 | 457.79 | 2724.21 | 6273.74/1.43% | |
Q4.2 | 196.96 | 416.01 | 774.25 | 488.17 | 3035.03 | 6595.58/1.45% | |
Q4.3 | 1049.04 | 1636.10 | 2648.57 | 1747.15 | 4212.71 | 12277.43/2.94% | |
Total | 292.97 | 539.07 | 961.17 | 631.82 | 2513.60 | 5235.15/1.09% | |
1000 | Q1.1 | 476.85 | 936.78 | 2173.84 | 589.04 | 3576.84 | 10766.92/2.17% |
Q1.2 | 354.13 | 703.09 | 1399.27 | 404.63 | 2431.06 | 4189.42 | |
Q1.3 | 386.30 | 741.74 | 1430.06 | 425.39 | 1971.07 | 2970.18 | |
Q2.1 | 396.13 | 789.61 | 1576.43 | 415.97 | 1747.05 | 2487.82 | |
Q2.2 | 398.13 | 779.49 | 1548.78 | 426.37 | 1489.50 | 2283.08 | |
Q2.3 | 396.22 | 773.03 | 1542.58 | 404.89 | 1337.23 | 2261.86 | |
Q3.1 | 401.29 | 777.99 | 1579.48 | 412.47 | 1467.30 | 2387.84 | |
Q3.2 | 466.83 | 856.32 | 1651.12 | 484.24 | 1779.71 | 3242.41 | |
Q3.3 | 444.43 | 849.28 | 1636.10 | 463.30 | 2240.67 | 4146.23 | |
Q3.4 | 937.92 | 1546.29 | 2612.82 | 932.41 | 3326.44 | 6891.25 | |
Q4.1 | 409.63 | 824.52 | 1608.71 | 425.12 | 3036.08 | 6497.19/2.22% | |
Q4.2 | 417.70 | 844.03 | 1663.59 | 451.22 | 3413.08 | 5774.76/4.55% | |
Q4.3 | 1259.57 | 1994.06 | 3235.35 | 1689.60 | 5140.59 | 11645.83/2.38% | |
Total | 518.86 | 954.59 | 1818.23 | 578.31 | 2528.45 | 5016.44/0.86% | |
10000 | Q1.1 | 502.79 | 976.64 | 1876.83 | 772.26 | 4232.76 | 14508.03 |
Q1.2 | 385.83 | 728.87 | 1408.21 | 474.30 | 2437.27 | 4044.32 | |
Q1.3 | 411.04 | 766.88 | 1420.83 | 496.38 | 2232.82 | 2756.73 | |
Q2.1 | 426.85 | 808.79 | 1552.06 | 482.23 | 2435.83 | 2812.12 | |
Q2.2 | 428.44 | 808.44 | 1547.05 | 491.30 | 2148.69 | 2778.46 | |
Q2.3 | 424.88 | 790.97 | 1518.39 | 481.49 | 2035.67 | 2824.06 | |
Q3.1 | 430.00 | 812.11 | 1534.64 | 479.70 | 2358.08 | 2827.38 | |
Q3.2 | 491.93 | 879.05 | 1632.51 | 574.54 | 2547.74 | 3261.59/2.50% | |
Q3.3 | 483.87 | 876.05 | 1646.57 | 523.46 | 2501.18 | 5753.11/11.11% | |
Q3.4 | 966.57 | 1579.42 | 2639.72 | 1024.07 | 3624.23 | 7868.68/3.85% | |
Q4.1 | 433.51 | 843.04 | 1611.06 | 488.41 | 3110.84 | 6889.55 | |
Q4.2 | 447.69 | 882.36 | 1685.38 | 517.94 | 3099.13 | 4955.08 | |
Q4.3 | 1304.88 | 2131.19 | 3300.50 | 1777.68 | 5013.85 | 13861.27 | |
Total | 549.08 | 990.14 | 1796.84 | 659.67 | 2904.05 | 5644.54/1.35% | |
100000 | Q1.1 | 675.06 | 2042.42 | 4180.84 | 1082.81 | 8344.58 | 22257.57 |
Q1.2 | 523.85 | 1625.18 | 3348.57 | 815.97 | 5708.44 | 9131.57 | |
Q1.3 | 576.77 | 1689.07 | 3388.77 | 858.46 | 4994.33 | 8851.27 | |
Q2.1 | 593.84 | 1740.86 | 3410.08 | 853.22 | 5117.77 | 9297.82 | |
Q2.2 | 595.50 | 1772.65 | 3436.87 | 852.19 | 4861.39 | 9166.90 | |
Q2.3 | 586.94 | 1765.92 | 3466.89 | 807.70 | 4731.81 | 9088.02 | |
Q3.1 | 596.31 | 1769.98 | 3499.83 | 843.34 | 4979.61 | 9521.87 | |
Q3.2 | 684.93 | 1914.43 | 3652.75 | 958.34 | 5439.97 | 10442.05 | |
Q3.3 | 654.18 | 1871.97 | 3654.16 | 927.20 | 5939.36 | 11308.07 | |
Q3.4 | 1155.46 | 2517.51 | 4699.65 | 1413.27 | 7388.26 | 15554.08/1.67% | |
Q4.1 | 604.28 | 1743.92 | 3574.57 | 889.16 | 7002.80 | 13778.51/18.64% | |
Q4.2 | 625.01 | 1886.34 | 3752.79 | 909.96 | 7051.97 | 10190.56/16.67% | |
Q4.3 | 1561.09 | 3115.93 | 5360.42 | 2151.64 | 9693.87 | 22114.15 | |
Total | 725.39 | 1957.08 | 3799.19 | 1026.98 | 6228.20 | 12133.60/2.68% |
...
04 结论概述
- 增加一个 In-mem Segment 对查询并发性能有一个(不大的)固定的 overhead,e.g. 20~%
- 当修改行数特别大的时候,比如 #mod-rows > 10w,查询性能逐渐开始出现明显下降,原因是计算的重心逐渐移动到 In-mem 数据上,而不是预计算后的历史数据上。
- 功能正常,假设 all measures are sum
...