Apache Kylin : Analytical Data Warehouse for Big Data
Page History
...
For example: if cuboid includes these three columns: BUYER_ID, TRANS_ID, LEAF_CATEG_ID, and then it will sort data in one partition by BUYER_ID column when saving this cuboid data.
Notes: Currently Apache Spark 2.4.6 which Kylin 4.0 used only supports filter out unwanted data through the min-max index of RowGroup in parquet files, it means that if there are some RowGroups in one parquet file, Spark will filter out unwanted data by the min-mas index of RowGroup, but if one parquet file only includes one RowGroup, filter doesn't make effect.
Pack a number of small files into a single partition
...