Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
Preparation
In order to let readers understand the performance differences between Kylin 4 on S3 and Kylin4 On S3 with Soft Affinity and Local Cache simply and directly, I provided a performance benchmark report in a standard software and hardware environment. Because I am familiar with AWS products(EC2, S3) was chosen as my benchmark platform.
Besides, I chose TPC-H (https://github.com/Kyligence/kylin-tpch) as the benchmark standard. The scale factor used in this test is 100 ( meaning fact table has 600 million rows).
The following table shows the aspects compared between different versions in this benchmark report.
Metrics/Aspect | Description |
Cubing Duration | Duration of pre-calculation(cube building) process(load source table into Kylin) . |
Cube Size | Disk space occupied by cube/index. |
Response Time | Serial query test lasting fifteen minutes, taking the average of the overall Response Time as the result. |
The following table shows information about software and hardware used in this performance benchmark.
There are three role of EC2 node for test.
- Distribution Node (which installed Zookeeper & Mysql service):
Item | Value |
---|---|
Instance Type | m5.xlarge |
Node Memory | 16 GB |
Node vCPU | 4 |
Node Disk | 30 GB(gp2) |
Node Count | 1 |
Network Brand with | 5 Gbps |
Zookeeper Version | 3.4.3 |
Mysql Version | 5.7 |
- Master Node(which installed Kylin 4 & Spark Master & Hive Metastore):
Item | Value |
---|---|
Instance Type | m5.4xlarge |
Node Memory | 64 GB |
Node vCPU | 16 |
Node Disk | 100 GB(gp2) |
Node Count | 1 |
Network Brand with | 5 Gbps |
Kylin Version | 4.0.0 |
Spark Version | 3.1.1(on Hadoop 3.2) |
Hive Version | 2.3.9 |
- Slave Node(Which only installed Spark worker):
Item | Value |
---|---|
Instance Type | m5.4xlarge |
Node vCPU | 16 |
Node Disk | 400GB *2(SSD) |
Node Count | 4 |
Network Brand with | 5 Gbps |
Spark Version | 3.1.1(on Hadoop 3.2) |
Benchmark Results
Figure-1 : Cubing duration of TPC-H (sf = 100)
Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 | Q8 | Q9 | Q10 | Q11 | Q12 | Q13 | Q14 | Q15 | Q16 | Q17 | Q18 | Q19 | Q20 | Q21 | Q22 | |
Cubing duration(Minutes) | 1.2 | 9.55 | 43.45 | 30.57 | 26.78 | 18.9 | 18.6 | 53.85 | 61.63 | 30.45 | 10.73 | 37.25 | 24.03 | 25.2 | 40.13 | 9.22 | 53.07 | 46 | 45.5 | 44 | 97.15 | 45 |
Table-1 : Cubing duration of TPC-H (sf = 100)
Figure-2 : Storage size of TPC-H (sf = 100)
Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 | Q8 | Q9 | Q10 | Q11 | Q12 | Q13 | Q14 | Q15 | Q16 | Q17 | Q18 | Q19 | Q20 | Q21 | Q22 | |
Cuboid Storage(GB) | 0.000152 | 1.55 | 12.74 | 0.46 | 4.03 | 0.0031 | 2.92 | 16.89 | 37.55 | 5.04 | 1.81 | 0.7892 | 4.05 | 0.1278 | 0.2137 | 1.9 | 6.84 | 9.11 | 5.56 | 7.89 | 31.75 | 3.5138 |
Table-2 : Storage size of TPC-H (sf = 100)
Figure-3 : Avg response time of TPC-H Query (sf=100)
Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 | Q8 | Q9 | Q10 | Q11 | Q12 | Q13 | Q14 | Q15 | Q16 | Q17 | Q18 | Q19 | Q20 | Q21 | Q22 | |
Only S3 | 1036 | 5249.33 | 2559.89 | 2126 | 552.22 | 441.11 | 531.44 | 2563.89 | 3850.33 | 2819.56 | 6749.22 | 810.67 | 19969.11 | 427.78 | 9927.67 | 5965.67 | 2086.13 | 9519.63 | 968.5 | 11847.38 | 41053 | 6330.5 |
Soft + local cache | 610.27 | 4048.91 | 2705.82 | 5359.18 | 285.55 | 192.73 | 341.55 | 1587.82 | 2433.64 | 3519.18 | 5666.55 | 464.18 | 22198.73 | 216.09 | 7975.91 | 4673.6 | 1131.8 | 8692.5 | 533.6 | 7135.1 | 30987.9 | 4167.8 |
Table-3: Avg response time of TPC-H Query (sf=100)