Preparation

In order to let readers understand the performance differences between Kylin 4 on S3 and Kylin4 On S3 with Soft Affinity and Local Cache simply and directly, I provided a performance benchmark report in a standard software and hardware environment. Because I am familiar with AWS products(EC2, S3) was chosen as my benchmark platform.

Besides, I chose TPC-H (https://github.com/Kyligence/kylin-tpch) as the benchmark standard. The scale factor used in this test is 100 ( meaning fact table has 600 million rows).

The following table shows the aspects compared between different versions in this benchmark report.

Metrics/Aspect	Description
Cubing Duration	Duration of pre-calculation(cube building) process(load source table into Kylin) .
Cube Size	Disk space occupied by cube/index.
Response Time	Serial query test lasting fifteen minutes, taking the average of the overall Response Time as the result.

The following table shows information about software and hardware used in this performance benchmark.

There are three role of EC2 node for test.

Distribution Node (which installed Zookeeper & Mysql service):

Item	Value
Instance Type	m5.xlarge
Node Memory	16 GB
Node vCPU	4
Node Disk	30 GB(gp2)
Node Count	1
Network Brand with	5 Gbps
Zookeeper Version	3.4.3
Mysql Version	5.7

Master Node(which installed Kylin 4 & Spark Master & Hive Metastore):

Item	Value
Instance Type	m5.4xlarge
Node Memory	64 GB
Node vCPU	16
Node Disk	100 GB(gp2)
Node Count	1
Network Brand with	5 Gbps
Kylin Version	4.0.0
Spark Version	3.1.1(on Hadoop 3.2)
Hive Version	2.3.9

Slave Node(Which only installed Spark worker):

Item	Value
Instance Type	m5.4xlarge
Node vCPU	16
Node Disk	400GB *2(SSD)
Node Count	4
Network Brand with	5 Gbps
Spark Version	3.1.1(on Hadoop 3.2)

Benchmark Results

Figure-1 : Cubing duration of TPC-H (sf = 100)

	Q1	Q2	Q3	Q4	Q5	Q6	Q7	Q8	Q9	Q10	Q11	Q12	Q13	Q14	Q15	Q16	Q17	Q18	Q19	Q20	Q21	Q22
Cubing duration(Minutes)	1.2	9.55	43.45	30.57	26.78	18.9	18.6	53.85	61.63	30.45	10.73	37.25	24.03	25.2	40.13	9.22	53.07	46	45.5	44	97.15	45

Table-1 : Cubing duration of TPC-H (sf = 100)

Figure-2 : Storage size of TPC-H (sf = 100)

	Q1	Q2	Q3	Q4	Q5	Q6	Q7	Q8	Q9	Q10	Q11	Q12	Q13	Q14	Q15	Q16	Q17	Q18	Q19	Q20	Q21	Q22
Cuboid Storage(GB)	0.000152	1.55	12.74	0.46	4.03	0.0031	2.92	16.89	37.55	5.04	1.81	0.7892	4.05	0.1278	0.2137	1.9	6.84	9.11	5.56	7.89	31.75	3.5138

Table-2 : Storage size of TPC-H (sf = 100)

Figure-3 : Avg response time of TPC-H Query (sf=100)

	Q1	Q2	Q3	Q4	Q5	Q6	Q7	Q8	Q9	Q10	Q11	Q12	Q13	Q14	Q15	Q16	Q17	Q18	Q19	Q20	Q21	Q22
Only S3	1036	5249.33	2559.89	2126	552.22	441.11	531.44	2563.89	3850.33	2819.56	6749.22	810.67	19969.11	427.78	9927.67	5965.67	2086.13	9519.63	968.5	11847.38	41053	6330.5
Soft + local cache	610.27	4048.91	2705.82	5359.18	285.55	192.73	341.55	1587.82	2433.64	3519.18	5666.55	464.18	22198.73	216.09	7975.91	4673.6	1131.8	8692.5	533.6	7135.1	30987.9	4167.8

Table-3: Avg response time of TPC-H Query (sf=100)

Conclusions

Query performance.

In big query scenarios(query which scans and does onsite complex calculations on a large mount of partitions/files) which use TPCH-100, response time of Kylin 4 on S3 with Soft Affinity and Local Cache has significant less than kylin 4 on S3 only.

Thanks to Soft Affinity and Local Cache, Kylin 4 query performance improvements can be achieved in basically most queries.

It is observed that the results (Q4, Q13) of turning on the Soft Affinity and Local Cache are lower than when using S3 alone as storage. This may be due to some reason that the data was not read through the cache. The underlying reason was not carried out in this test. Further analysis, we will gradually improve in the subsequent optimization process.

On the conclusion, Soft Affinity and Local Cache can achieve significant performance improvements for both simple and complex queries.

Space shortcuts

Page tree

Preparation

Benchmark Results

Conclusions

Query performance.

Space shortcuts

Page tree

Performance Benchmark Report of Kylin 4.0.0 On Standalone

Preparation

Benchmark Results

Conclusions

Query performance.