THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

    For example: if cuboid includes these three columns: BUYER_ID, TRANS_ID, LEAF_CATEG_ID, and then it will sort data in one partition by BUYER_ID column when saving this cuboid data.

    Notes: Currently Apache Spark 2.4.6 which Kylin 4.0 used only supports filter out unwanted data through the min-max index of RowGroup in parquet files, it means that if there are some RowGroups in one parquet file, Spark will filter out unwanted data by the min-mas index of RowGroup, but if one parquet file only includes one RowGroup, filter doesn't make effect.

Pack a number of small files into a single partition

...