Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

Property

Default

Description

Since

kylin.spark-conf.auto.prior

true

For a CubeBuildJob and CubeMergeJob, it is important to allocate enough and proper resources(cpu/memory), including following config entries mainly:

spark.driver.memory
spark.executor.memory
spark.executor.cores
spark.executor.memoryOverhead
spark.executor.instances
spark.sql.shuffle.partitions

When `kylin.spark-conf.auto.prior` is set to true, Kylin will try to adjust above config entries according to:

Count of cuboids to be built
Max size of fact table
Available resources from current resource manager 's queue

But user still can choose to override some config via `kylin.engine.spark-conf.` in Cube level .
Check detail at How to improve cube building and query performance

4.0.0

kylin.engine.spark-conf.

null

User can choose to set spark conf of Cube/Merge Job at Cube level.

4.0.0

kylin.engine.driver-memory-base

1024

Driver memory(spark.driver.memory) is auto adjusted by cuboid count and configuration.

kylin.engine.driver-memory-strategy will decided some level. For example, "2,20,100" will transfer to four cuboid count ranges, from low to high, as following:

Level 1 : (0, 2)
Level 2 : (2, 20)
Level 3 : (20, 100)
Level 4 : (100, +)

So, we can find a proper level for specific cuboid count. 12 will be level 2, and 230 will be level 4.

Driver memory will be calculated by following formula :

Code Block

language	sql
theme	Emacs

min(kylin.engine.driver-memory-base * level, kylin.engine.driver-memory-maximum)

4.0.0

kylin.engine.driver-memory-maximum

4096

See above.

4.0.0

kylin.engine.driver-memory-strategy

2,20,100

See above.

4.0.0

kylin.engine.base-executor-instance

5

4.0.0

kylin.engine.spark.required-cores

1

4.0.0

kylin.engine.executor-instance-strategy

100,2,500,3,1000,4

4.0.0

kylin.engine.retry-memory-gradient

4.0.0

Resource Detect File Summary

Following files are under WORKING-DIR/$PROJECT/job_tmp/${JOB_ID}/share, produced in the first step of BuildJob. And they served to spark resources automatic adjustment strategy. (Source code : ResourceDetectBeforeCubingJob).

Resource Detect File

Data Type

Format

Description

count_distinct.json

Boolean

Binary

Cube contains COUNT_DISTINCT(bitmap) measure.

Sample :

true

${JOB_ID}_resource_path.json

Map<String, List<String>>

Binary

Key is cuboid ID, and value is cuboid's parent dataset's partition path.

-1 means Flat Table.

Sample :

Code Block

language	js
theme	DJango

{
   "-1" : ["hdfs://cdh-master:8020/user/hive/warehouse/tpch_flat_orc_10.db/lineitem", "hdfs://cdh-master:8020/user/hive/warehouse/tpch_flat_orc_10.db/part"]
}

${JOB_ID}_cubing_detect_items.json

Map<String, Integer>

Binary

Key is cuboid ID, and value is cuboid's parent dataset's partition count.

Sample :

Code Block

language	js
theme	DJango

{
  "-1": 32
}

Global dictionary

Property	Default	Description	Since




kylin.dictionary.detect-data-skew-sample-enabled
kylin.dictionary.detect-data-skew-sample-rate
kylin.dictionary.detect-data-skew-percentage-threshold

...

Space shortcuts

Page tree

Versions Compared

Old Version 39

New Version 40

Key

Resource Detect File Summary

Global dictionary