THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

PropertyDefaultDescriptionSince
kylin.spark-conf.auto.prior
trueFor a CubeBuildJob and CubeMergeJob, it is important to allocate enough and proper resources(cpu/memory), including following config entries mainly:
  • spark.driver.memory
  • spark.executor.memory
  • spark.executor.cores
  • spark.executor.memoryOverhead
  • spark.executor.instances
  • spark.sql.shuffle.partitions

When `kylin.spark-conf.auto.prior` is set to true, Kylin will try to adjust above config entries according to:
  • Count of cuboids to be built
  • Max size of fact table
  • Available resources from current resource manager 's queue

But user still can choose to override some config via `kylin.engine.spark-conf.` in Cube level .
Check detail at How to improve cube building and query performance
4.0.0
kylin.engine.spark-conf.
nullUser can choose to set spark conf of Cube/Merge Job at Cube level.4.0.0
kylin.engine.driver-memory-base
1024

Driver memory(spark.driver.memory) is auto adjusted by cuboid count and configuration.

kylin.engine.driver-memory-strategy will decided some level. For example, "2,20,100" will transfer to four cuboid count ranges, from low to high, as following: 

  • Level 1 : (0, 2)
  • Level 2 : (2, 20)
  • Level 3 : (20, 100)
  • Level 4 : (100, +)

So, we can find a proper level for specific cuboid count. 12 will be level 2, and 230 will be level 4.


Driver memory will be calculated by following formula : 

Code Block
languagesql
themeEmacs
min(kylin.engine.driver-memory-base * level, kylin.engine.driver-memory-maximum)


4.0.0
kylin.engine.driver-memory-maximum
4096See above.4.0.0
kylin.engine.driver-memory-strategy
2,20,100See above.4.0.0
kylin.engine.base-executor-instance
5
4.0.0
kylin.engine.spark.required-cores
1
4.0.0
kylin.engine.executor-instance-strategy
100,2,500,3,1000,4
4.0.0
kylin.engine.retry-memory-gradient


4.0.0


Resource Detect File Summary

Following files are under WORKING-DIR/$PROJECT/job_tmp/${JOB_ID}/share, produced in the first step of BuildJob.  And they served to spark resources automatic adjustment strategy. (Source code : ResourceDetectBeforeCubingJob).

Resource Detect FileData TypeFormatDescription
count_distinct.jsonBooleanBinary

Cube contains COUNT_DISTINCT(bitmap) measure.

Sample :

true

${JOB_ID}_resource_path.json

Map<String, List<String>>

Binary

Key is cuboid ID, and value is cuboid's parent dataset's partition path.

-1 means Flat Table.

Sample :

Code Block
languagejs
themeDJango
{
   "-1" : ["hdfs://cdh-master:8020/user/hive/warehouse/tpch_flat_orc_10.db/lineitem", "hdfs://cdh-master:8020/user/hive/warehouse/tpch_flat_orc_10.db/part"]
}


${JOB_ID}_cubing_detect_items.jsonMap<String, Integer>Binary

Key is cuboid ID, and value is cuboid's parent dataset's partition count.

Sample : 

Code Block
languagejs
themeDJango
{
  "-1": 32
}



Global dictionary

PropertyDefaultDescriptionSince
















kylin.dictionary.detect-data-skew-sample-enabled



kylin.dictionary.detect-data-skew-sample-rate



kylin.dictionary.detect-data-skew-percentage-threshold



...