Apache Kylin : Analytical Data Warehouse for Big Data
Page History
...
Then look into parquet files, data within files are sorted by rowkey columns, that is to say, prefix match in query is as important as Kylin on HBase. When a query condition satisfies prefix match, it can filter row groups with column's max/min index. Furthermore, we can reduce row group size to make finer index granularity, but be aware that the compression rate will be lower if we set row group size smaller.
Dynamic elimination of partitioning dimensions
...
Optimization of build engine
cache parent dataset
Kylin build cube layer by layer. For a parent layer with multi cuboids to read, we can choose to cache parent dataset by setting kylin.. Please try to complete this part if you have enough time. Shengjun Zhengengine.spark.parent-dataset.max.persist.count to a number greater than 0. But notice that if you set this value too small, it will affect the parallelism of build job, as the build granularity is at cuboid level.
Practice of Kylin 4 in Youzan
...