Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

For example: if cuboid includes these three columns: BUYER_ID, TRANS_ID, LEAF_CATEG_ID, and then it will sort data in one partition by BUYER_ID column when saving this cuboid data.

Notes: Currently Apache Spark 2.4.6 which Kylin 4.0 used only supports filter out unwanted data through the min-max index of RowGroup in parquet files, it means that if there are some RowGroups in one parquet file, Spark will filter out unwanted data by the min-mas index of RowGroup, but if one parquet file only includes one RowGroup, filter doesn't make effect.

Pack a number of small files into a single partition

...

Space shortcuts

Page tree

Versions Compared

Old Version 26

New Version 27

Key

Pack a number of small files into a single partition