Apache Kylin : Analytical Data Warehouse for Big Data
Page History
...
Cubing duration and Storage size
Response Time
Conclusions
...
Compared with Kylin 3's MR cube engine, thanks to higher resource utilization and no more steps of converting cuboid to specific data format(HFile), Kylin 4 greatly reduces the cubing duration by 62.6%.
In Kylin 3, the cuboid files are stored in two different formats, instead Kylin 4 uses Parquet. We know Parquet has better encode efficiency and higher compression ratio, so the disk space of same cube reduced greatly by 72.56%.
Kylin 3(MR engine) has lower resource utilization
Kylin 4(New Spark Engine) has a higher and stable resource utilization
Query performance.
In big query scenarios(query which scans and does onsite complex calculations on a large mount of partitions/files), Kylin 3 query optimization is difficult, and needs to optimize HBase RS Server and Kylin Query Server repeatedly. In stress test scenarios, query node is unstable because it need do post-calculation on large data set, and performance(query latency) is getting worse as time goes by. Kylin 4 removes the single bottleneck of Query Server, and both Response Time and QPS are obviously improved and performance is stable during the stress test. In TPC-H query set, response time of Kylin 4 is improved by 5-7 times, and its concurrency is also improved by 4 times.P95 response time of TPC-
H Query under different concurrency
In the point query scenario (query which scans small mount of partitions/files and do not need too much onsite calculations) , Kylin 4 can meet the sub-second query latency requirement after some simple parameters adjustment, and its performance is relatively close to Kylin 3 (to be specific, only worse sightly) .
...