Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

Then look into parquet files, data within files are sorted by rowkey columns, that is to say, prefix match in query is as important as Kylin on HBase. When a query condition satisfies prefix match, it can filter row groups with column's max/min index. Furthermore, we can reduce row group size to make finer index granularity, but be aware that the compression rate will be lower if we set row group size smaller.

Dynamic elimination of partitioning dimensions

...

Optimization of build engine

cache parent dataset

Image Added

Kylin build cube layer by layer. For a parent layer with multi cuboids to read, we can choose to cache parent dataset by setting kylin.. Please try to complete this part if you have enough time. Shengjun Zhengengine.spark.parent-dataset.max.persist.count to a number greater than 0. But notice that if you set this value too small, it will affect the parallelism of build job, as the build granularity is at cuboid level.

Practice of Kylin 4 in Youzan

...

Space shortcuts

Page tree

Versions Compared

Old Version 13

New Version 14

Key

Dynamic elimination of partitioning dimensions

Optimization of build engine

cache parent dataset

Practice of Kylin 4 in Youzan

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 13

New Version 14

Key

Dynamic elimination of partitioning dimensions

Optimization of build engine

cache parent dataset

Practice of Kylin 4 in Youzan