THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. The first step is detecting how many source files will be built as cube data;
  2. The second one is to build the snapshot tables (if needed), generate the global dictionary (if needneeded) and build cube data as parquet files.

...

    After applying all rules above, you can find some log messages in 'kylin.log' file as below:

Manually setting spark configurations (if

...

needed)

Based on the values of automatically adjusted configurations by Kylin, if there are still some cube building performance issues, you can appropriately change the values of these configurations to have a try, for example:

...

Global dictionary building performance tuning

If the cube has accurate "count distinct" measures, Kylin 4.0 will build the global dictionary for these measure columns in the second step based on Spark for distributed encoding processing, which reduces the pressure on a single machine node, and can break the limit of the maximum integer of the global dictionary, please refer to the detail design article: https://cwiki.apache.org/confluence/display/KYLIN/Global+Dictionary+on+Spark . There is one configuration about tuning on global dictionary building:

...