Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

Basic configuration

Property	Default	DescDescription	Since
kylin.snapshot.parallel-build-enabled
kylin.snapshot.parallel-build-timeout-seconds
kylin.snapshot.shard-size-mb
kylin.storage.columnar.shard-size-mb
kylin.storage.columnar.shard-rowcount
kylin.storage.columnar.shard-countdistinct-rowcount
kylin.storage.columnar.repartition-threshold-size-mb

Advanced configuration

Property	Default	Description	Since
kylin.engine.submit-hadoop-conf-dir

Advanced configuration

Property	Default	Desc	Since
null
kylin.engine.spark.cache-parent-dataset-storage-level	NONE		4.0.0
kylin.engine.spark.cache-parent-dataset-count	1		4.0.0
kylin.engine.build-base-cuboid-enabled	true		4.0.0
kylin.engine.spark.repartition.dataset.after.encode-enabled	false	Global dictionary will be split into several buckets. To encode a column to int value more efficiently, source dataset will be repartitioned by the to-be encoded column to the same amount of partitions as the dictionary's bucket size. It sometimes bring side effect, because repartitioning by a single column is more likely to cause serious data skew, causing one task takes the majority of time in first layer's cuboid building. When faced with this case, you can try repartitioning encoded dataset by all RowKey columns to avoid data skew. The repartition size is default to max bucket size of all dictionaries, but you can also set to other flexible value by this option: 'kylin.engine.spark.dataset.repartition.num.after.encoding'	4.0.0
kylin.engine.spark.repartition.dataset.after.encode.num	0	see above	4.0.0

Spark resources automatic adjustment strategy

Property

Default

DescDescription

Since

kylin.spark-conf.auto.prior

true

For a CubeBuildJob and CubeMergeJob, it is important to allocate enough and proper resources(cpu/memory), including following config entries mainly:
- spark.driver.memory
- spark.executor.memory
- spark.executor.cores
- spark.executor.memoryOverhead
- spark.executor.instances
- spark.sql.shuffle.partitions

When `kylin.spark-conf.auto.prior` is set to true, Kylin will try to adjust above config entries according to:
- Count of cuboids to be built
- (Max)Size of fact table
- Available resources from current resource manager 's queue

But user still can choose to override some config via `kylin.engine.spark-conf.XXX` in Cube level .
Check detail at How to improve cube building and query performance

4.0.0

kylin.engine.spark-conf.

null

User can choose to set spark conf of Cube/Merge Job at Cube level.

4.0.0

kylin.engine.driver-memory-base

1024

Driver memory(spark.driver.memory) is auto adjusted by cuboid count and configuration.

kylin.engine.driver-memory-strategy will decided some level. For example, "2,20,100" will transfer to four cuboid count ranges, from low to high, as following:

Level 1 : (0, 2)
Level 2 : (2, 20)
Level 3 : (20, 100)
Level 4 : (100, +)

So, we can find a proper level for specific cuboid count.

Driver memory will be calculated by following formula :

LaTeX Formatting

text-align	center

min(kylin.engine.driver-memory-base * level, kylin.engine.driver-memory-maximum)

4.0.0

kylin.engine.driver-memory-maximum

4096

See above.

4.0.0

kylin.engine.driver-memory-strategy

2,20,100

See above.

4.0.0

kylin.engine.base-executor-instance

5

4.0.0

kylin.engine.spark.required-cores

1

4.0.0

kylin.engine.executor-instance-strategy

100,2,500,3,1000,4

4.0.0

kylin.engine.retry-memory-gradient

4.0.0

Global dictionary

Property	Default	Description	Since




kylin.dictionary.detect-data-skew-sample-enabled
kylin.dictionary

...

.detect-data-skew-sample-rate
kylin.dictionary.detect-data-skew-percentage-threshold

Please remove following :

...

Space shortcuts

Page tree

Versions Compared

Old Version 36

New Version 37

Key

Basic configuration

Advanced configuration

Advanced configuration

Spark resources automatic adjustment strategy

Global dictionary