Kylin 4.0 Build Engine Configuration

Basic configuration

Property	Default	Desc	Since
kylin.snapshot.parallel-build-enabled
kylin.snapshot.parallel-build-timeout-seconds
kylin.snapshot.shard-size-mb
kylin.storage.columnar.shard-size-mb
kylin.storage.columnar.shard-rowcount
kylin.storage.columnar.shard-countdistinct-rowcount
kylin.storage.columnar.repartition-threshold-size-mb
kylin.engine.submit-hadoop-conf-dir

Advanced configuration

Property	Default	Since
kylin.engine.spark.cache-parent-dataset-storage-level	NONE	4.0.0
kylin.engine.spark.cache-parent-dataset-count	1	4.0.0
kylin.engine.build-base-cuboid-enabled	true	4.0.0

Spark resources automatic adjustment strategy

Property	Default	Desc	Since
kylin.spark-conf.auto.prior	true	For a CubeBuildJob and CubeMergeJob, it is important to allocate enough and proper resources(cpu/memory), including following config entries mainly: - spark.driver.memory - spark.executor.memory - spark.executor.cores - spark.executor.memoryOverhead - spark.executor.instances - spark.sql.shuffle.partitions When `kylin.spark-conf.auto.prior` is set to true, Kylin will try to adjust above config entries according to: - Count of cuboids to be built - (Max)Size of fact table - Available resources from current resource manager 's queue But user still can choose to override some config via `kylin.engine.spark-conf.XXX` in Cube level . Check detail at How to improve cube building and query performance	4.0.0
kylin.engine.spark-conf.	null	User can choose to set spark conf at Cube level.	4.0.0
kylin.engine.driver-memory-base
kylin.engine.driver-memory-maximum
kylin.engine.driver-memory-strategy
kylin.engine.base-executor-instance
kylin.engine.spark.required-cores
kylin.engine.executor-instance-strategy
kylin.engine.retry-memory-gradient

Global dictionary

Data shew

Please remove following :

Property	Default	Description	Version
kylin.engine.spark.build-class-name	org.apache.kylin.engine.spark.job.CubeBuildJob	For developer only. The className use in spark-submit	4.0.0-ALPHA
kylin.engine.spark.cluster-info-fetcher-class-name	org.apache.kylin.cluster.YarnInfoFetcher	For developer only. Fetch yarn information of spark job	4.0.0-ALPHA
kylin.engine.spark-conf.XXX		Before Kylin submit a cubing job, some major property(cores and memory) will be automatically adjusted adaptively. (if kylin.spark-conf.auto.prior was set to true). After auto adjust, spark conf will be overwrite by this property. If you want to set spark.driver.extraJavaOptions=-Dhdp.version=current, you can add follow line in kylin.properties: kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current	4.0.0-ALPHA
kylin.storage.provider	org.apache.kylin.common.storage.DefaultStorageProvider	The content summary objects returned by different cloud vendors are not the same, so need to provide targeted implementation. You can refer to this to learn more : org.apache.kylin.common.storage.IStorageProvider	4.0.0-ALPHA
kylin.engine.spark.merge-class-name	org.apache.kylin.engine.spark.job.CubeMergeJob	For developer only. The className use in spark-submit	4.0.0-ALPHA
kylin.engine.spark.task-impact-instance-enabled	true	UPDATING	4.0.0-ALPHA
kylin.engine.spark.task-core-factor	3	UPDATING	4.0.0-ALPHA
kylin.engine.driver-memory-base	1024	Auto adujst *spark.driver.memory* for Build Engine if kylin.engine.spark-conf.spark.driver.memory is not set.	4.0.0-ALPHA
kylin.engine.driver-memory-strategy	{"2", "20", "100"}	UPDATING	4.0.0-ALPHA
kylin.engine.driver-memory-maximum	4096	UPDATING	4.0.0-ALPHA
kylin.engine.persist-flattable-threshold	1	If the number of cuboids which will be build from flat table is bigger than this threshold, the flat table will be persisted into $HDFS_WORKING_DIR/job_tmp/flat_table for saving more memory.	4.0.0-ALPHA
kylin.snapshot.parallel-build-timeout-seconds	3600	To improve the speed of snapshot build.	4.0.0-ALPHA
kylin.snapshot.parallel-build-enabled	true	UPDATING

kylin.spark-conf.auto.prior	true	Enable adjust spark parameters adaptively.	4.0.0-ALPHA
kylin.engine.submit-hadoop-conf-dir	/etc/hadoop/conf	Set HADOOP_CONF_DIR for spark-submit.	4.0.0-ALPHA
kylin.storage.columnar.shard-size-mb	128	The max size of pre-calcualted cuboid parquet file.	4.0.0-ALPHA
kylin.storage.columnar.shard-rowcount	2500000	The max rows of pre-calcualted cuboid parquet file.	4.0.0-ALPHA
kylin.storage.columnar.shard-countdistinct-rowcount	1000000	The max rows of pre-calcualted cuboid parquet file when cuboid has bitmap measure. (When cuboid has BItmap, it is large.)	4.0.0-ALPHA
kylin.query.spark-engine.join-memory-fraction	0.3	Limit memory used by broadcast join of Sparder. (Broadcast join cause unstable.)	4.0.0-ALPHA

Space shortcuts

Page tree

Basic configuration

Advanced configuration

Spark resources automatic adjustment strategy

Global dictionary

Data shew