1. Background

Kylin 4 is a major architecture upgrade version, as the picture shown below, both cube building engine and query engine use spark as calculation engine, and cube data is stored in parquet files instead of Hbase.

So the build/query performance tuning is very different from Kylin 3 tuning. This article will introduce how to improve cube build and query performance in Kylin 4, including some tuning ways which have been done by Kylin 4 automatically.

2. Cube building performance tuning

In Kylin 4, there are two steps in the cube building job, the first step detects how many source files which will be built as cube data, and the second one is to build the snapshot tables (if need), generate the global dictionary (if need) and build cube data as parquet files. In the second step, all calculations are operations with a relatively heavy load, so except using Joint and Hierarchy on Dimensions to reduce the number of cuboids ( refers to the section ‘Reduce combinations’ in http://kylin.apache.org/docs/tutorial/cube_build_performance.html ), it’s also very important to use the proper spark resources and configurations to build cube data. There are 3 key points in this section to improve cube build performance.

Space shortcuts

Page tree

1. Background

2. Cube building performance tuning

Space shortcuts

Page tree

How to improve cube building and query performance

1. Background

2. Cube building performance tuning