Apache Kylin : Analytical Data Warehouse for Big Data
Page History
...
Property | Default | Description | Since | |||||||
---|---|---|---|---|---|---|---|---|---|---|
kylin.spark-conf.auto.prior | true | For a CubeBuildJob and CubeMergeJob, it is important to allocate enough and proper resources(cpu/memory), including following config entries mainly:
When `kylin.spark-conf.auto.prior` is set to true, Kylin will try to adjust above config entries according to:
But user still can choose to override some config via `kylin.engine.spark-conf.` in Cube level . Check detail at How to improve cube building and query performance | 4.0.0 | |||||||
kylin.engine.spark-conf. | null | User can choose to set spark conf of Cube/Merge Job at Cube level. | 4.0.0 | |||||||
kylin.engine.driver-memory-base | 1024 | Driver memory(spark.driver.memory) is auto adjusted by cuboid count and configuration. kylin.engine.driver-memory-strategy will decided some level. For example, "2,20,100" will transfer to four cuboid count ranges, from low to high, as following:
So, we can find a proper level for specific cuboid count. 12 will be level 2, and 230 will be level 4. Driver memory will be calculated by following formula :
| 4.0.0 | |||||||
kylin.engine.driver-memory-maximum | 4096 | See above. | 4.0.0 | |||||||
kylin.engine.driver-memory-strategy | 2,20,100 | See above. | 4.0.0 | |||||||
kylin.engine.base-executor-instance | 5 | 4.0.0 | ||||||||
kylin.engine.spark.required-cores | 1 | 4.0.0 | ||||||||
kylin.engine.executor-instance-strategy | 100,2,500,3,1000,4 | 4.0.0 | ||||||||
kylin.engine.retry-memory-gradient | 4.0.0 |
Resource Detect File Summary
Following files are under WORKING-DIR/$PROJECT/job_tmp/${JOB_ID}/share, produced in the first step of BuildJob. And they served to spark resources automatic adjustment strategy. (Source code : ResourceDetectBeforeCubingJob).
Resource Detect File | Data Type | Format | Description | |||||||
---|---|---|---|---|---|---|---|---|---|---|
count_distinct.json | Boolean | Binary | Cube contains COUNT_DISTINCT(bitmap) measure. Sample : true | |||||||
${JOB_ID}_resource_path.json | Map<String, List<String>> | Binary | Key is cuboid ID, and value is cuboid's parent dataset's partition path. -1 means Flat Table. Sample :
| |||||||
${JOB_ID}_cubing_detect_items.json | Map<String, Integer> | Binary | Key is cuboid ID, and value is cuboid's parent dataset's partition count. Sample :
|
Global dictionary
Property | Default | Description | Since |
---|---|---|---|
kylin.dictionary.detect-data-skew-sample-enabled | |||
kylin.dictionary.detect-data-skew-sample-rate | |||
kylin.dictionary.detect-data-skew-percentage-threshold |
...