Apache Kylin : Analytical Data Warehouse for Big Data
Page History
Table of Contents |
---|
Some properties are not listed because they are not for Kylin user.
Basic configuration
Property | Default | Description | Since |
---|---|---|---|
kylin.snapshot.parallel-build-enabled | |||
kylin.snapshot.parallel-build-timeout-seconds | |||
kylin.snapshot.shard-size-mb | |||
kylin.storage.columnar.shard-size-mb | |||
kylin.storage.columnar.shard-rowcount | |||
kylin.storage.columnar.shard-countdistinct-rowcount | |||
kylin.storage.columnar.repartition-threshold-size-mb | |||
Advanced configuration
Property | Default | Description | Since |
---|---|---|---|
kylin.engine.submit-hadoop-conf-dir | null | ||
kylin.engine.spark.cache-parent-dataset-storage-level | NONE | 4.0.0 | |
kylin.engine.spark.cache-parent-dataset-count | 1 | 4.0.0 | |
kylin.engine.build-base-cuboid-enabled | true | 4.0.0 | |
kylin.engine.spark.repartition.dataset.after.encode-enabled | false | Global dictionary will be split into several buckets. To encode a column to int value more efficiently, source dataset will be repartitioned by the to-be encoded column to the same amount of partitions as the dictionary's bucket size. It sometimes bring side effect, because repartitioning by a single column is more likely to cause serious data skew, causing one task takes the majority of time in first layer's cuboid building. When faced with this case, you can try repartitioning encoded dataset by all RowKey columns to avoid data skew. The repartition size is default to max bucket size of all dictionaries, but you can also set to other flexible value by this option: 'kylin.engine.spark.dataset.repartition.num.after.encoding' | 4.0.0 |
kylin.engine.spark.repartition.dataset.after.encode.num | 0 | see above | 4.0.0 |
Spark resources automatic adjustment strategy
Property | Default | Description | Since |
---|---|---|---|
kylin.spark-conf.auto.prior | true | For a CubeBuildJob and CubeMergeJob, it is important to allocate enough and proper resources(cpu/memory), including following config entries mainly: - spark.driver.memory - spark.executor.memory - spark.executor.cores - spark.executor.memoryOverhead - spark.executor.instances - spark.sql.shuffle.partitions When `kylin.spark-conf.auto.prior` is set to true, Kylin will try to adjust above config entries according to: - Count of cuboids to be built - (Max)Size of fact table - Available resources from current resource manager 's queue But user still can choose to override some config via `kylin.engine.spark-conf.XXX` in Cube level . Check detail at How to improve cube building and query performance | 4.0.0 |
kylin.engine.spark-conf. | null | User can choose to set spark conf of Cube/Merge Job at Cube level. | 4.0.0 |
kylin.engine.driver-memory-base | 1024 | Driver memory(spark.driver.memory) is auto adjusted by cuboid count and configuration. kylin.engine.driver-memory-strategy will decided some level. For example, "2,20,100" will transfer to four cuboid count ranges, from low to high, as following:
So, we can find a proper level for specific cuboid count. 12 will be level 2, and 230 will be level 4. Driver memory will be calculated by following formula : |
| 4.0.0 | |||||||
kylin.engine.driver-memory-maximum | 4096 | See above. | 4.0.0 | |||||
kylin.engine.driver-memory-strategy | 2,20,100 | See above. | 4.0.0 | |||||
kylin.engine.base-executor-instance | 5 | 4.0.0 | ||||||
kylin.engine.spark.required-cores | 1 | 4.0.0 | ||||||
kylin.engine.executor-instance-strategy | 100,2,500,3,1000,4 | 4.0.0 | ||||||
kylin.engine.retry-memory-gradient | 4.0.0 |
Global dictionary
Property | Default | Description | Since |
---|---|---|---|
kylin.dictionary.detect-data-skew-sample-enabled | |||
kylin.dictionary.detect-data-skew-sample-rate | |||
kylin.dictionary.detect-data-skew-percentage-threshold |
Please remove following :
kylin.engine.spark.build-class-name
org.apache.kylin.engine.spark.job.CubeBuildJob
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.engine.spark.cluster-info-fetcher-class-name
org.apache.kylin.cluster.YarnInfoFetcher
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.engine.spark-conf.XXX
- Before Kylin submit a cubing job, some major property(cores and memory) will be automatically adjusted adaptively. (if kylin.spark-conf.auto.prior was set to true).
- After auto adjust, spark conf will be overwrite by this property. If you want to set spark.driver.extraJavaOptions=-Dhdp.version=current, you can add follow line in kylin.properties:
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.storage.provider
org.apache.kylin.common.storage.DefaultStorageProvider
The content summary objects returned by different cloud vendors are not the same, so need to provide targeted implementation.
You can refer to this to learn more : org.apache.kylin.common.storage.IStorageProvider
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.engine.spark.merge-class-name
org.apache.kylin.engine.spark.job.CubeMergeJob
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.engine.spark.task-impact-instance-enabled
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.engine.spark.task-core-factor
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.engine.driver-memory-base
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.engine.driver-memory-strategy
{"2", "20", "100"}
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.engine.driver-memory-maximum
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.engine.persist-flattable-threshold
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.snapshot.parallel-build-timeout-seconds
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.snapshot.parallel-build-enabled
kylin.spark-conf.auto.prior
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.engine.submit-hadoop-conf-dir
Set HADOOP_CONF_DIR for spark-submit.
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.storage.columnar.shard-size-mb
The max size of pre-calcualted cuboid parquet file.
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.storage.columnar.shard-rowcount
2500000
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.storage.columnar.shard-countdistinct-rowcount
Status | ||||||
---|---|---|---|---|---|---|
|
kylin.query.spark-engine.join-memory-fraction