THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

1. Background

    Kylin 4 is a major architecture upgrade version, as the picture shown below, both cube building engine and query engine use spark as calculation engine, and cube data is stored in parquet files instead of Hbase.


So the build/query performance tuning is very different from Kylin 3 tuning. This article will introduce how to improve cube build and query performance in Kylin 4, including some tuning ways which have been done by Kylin 4 automatically.


2. Cube building performance tuning

    In Kylin 4, there are two steps in the cube building job, the first step detects how many source files which will be built as cube data, and the second one is to build the snapshot tables (if need), generate the global dictionary (if need) and build cube data as parquet files. In the second step, all calculations are operations with a relatively heavy load, so except using Joint and Hierarchy on Dimensions to reduce the number of cuboids ( refers to the section ‘Reduce combinations’ in http://kylin.apache.org/docs/tutorial/cube_build_performance.html ), it’s also very important to use the proper spark resources and configurations to build cube data. There are 3 key points in this section to improve cube build performance.

















  • No labels