Build Engine Configuration

Property	Importance	Datatype	Configuration Level	Default	Description	Version	Reference
kylin.engine.spark.build-class-name	TRIVIAL	String	PROCESS	org.apache.kylin.engine.spark.job.CubeBuildJob	For developer only. The className use in spark-submit.	4.0.0-ALPHA
kylin.engine.spark.cluster-info-fetcher-class-name	TRIVIAL	String	PROCESS	org.apache.kylin.cluster.YarnInfoFetcher	For developer only. Fetch yarn information of spark job	4.0.0-ALPHA
kylin.engine.spark-conf.XXX	MINOR	String	PROCESS	Null	Spark configurations want to override for build job like "spark.driver.cores". If don't set these spark properties, kylin will automaticly adjust these properties before submitting build job.	4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.storage.provider	TRIVIAL	String	PROCESS	org.apache.kylin.common.storage.DefaultStorageProvider	The content summary objects returned by different cloud vendors are not the same, so need to provide targeted implementation. You can refer to this to learn more : org.apache.kylin.common.storage.IStorageProvider	4.0.0-ALPHA
kylin.engine.spark.merge-class-name	TRIVIAL	String	PROCESS	org.apache.kylin.engine.spark.job.CubeMergeJob	For developer only. The className use in spark-submit	4.0.0-ALPHA
kylin.engine.spark.task-impact-instance-enabled		Boolean	PROCESS	true	Check kylin.engine.spark.task-core-factor. If kylin.engine.spark.task-impact-instance-enabled is set to true and kylin.engine.spark-conf.spark.executor.instances is not set, Kylin will calculate spark.executor.instances for Build Engine.	4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.engine.spark.task-core-factor	MEDIUM	Integer	PROCESS	3	TO BE UPDATED	4.0.0-ALPHA
kylin.engine.driver-memory-base	MEDIUM	Integer	PROCESS	1024	Auto adujst *spark.driver.memory* for Build Engine if kylin.engine.spark-conf.spark.driver.memory is not set.	4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.engine.driver-memory-strategy	MEDIUM	Array	PROCESS	{"2", "20", "100"}	TO BE UPDATED	4.0.0-ALPHA
kylin.engine.driver-memory-maximum	MEDIUM	Integer	PROCESS	4096	TO BE UPDATED	4.0.0-ALPHA
kylin.engine.persist-flattable-threshold	MEDIUM	Integer	PROCESS	1	If the number of cuboids which will be build from flat table is bigger than this threshold, the flat table will be persisted into $HDFS_WORKING_DIR/job_tmp/flat_table for saving more memory.	4.0.0-ALPHA
kylin.snapshot.parallel-build-timeout-seconds	MAJOR	Integer	PROCESS	3600	To improve the speed of snapshot build.	4.0.0-ALPHA
kylin.snapshot.parallel-build-enabled	MAJOR	Boolean	PROCESS	true	TO BE UPDATED

kylin.spark-conf.auto.prior	MINOR	Boolean	PROCESS	true	Enable adjust spark parameters adaptively.	4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.engine.submit-hadoop-conf-dir	MAJOR	String	PROCESS	/etc/hadoop/conf	Set HADOOP_CONF_DIR for spark-submit.	4.0.0-ALPHA
kylin.storage.columnar.shard-size-mb	MAJOR	Integer	CUBE	128	The max size of pre-calcualted cuboid parquet file.	4.0.0-ALPHA	ShardBy
kylin.storage.columnar.shard-rowcount	MAJOR	Long	CUBE	2500000	The max rows of pre-calcualted cuboid parquet file.	4.0.0-ALPHA
kylin.storage.columnar.shard-countdistinct-rowcount	MAJOR	Long	CUBE	1000000	The max rows of pre-calcualted cuboid parquet file when cuboid has bitmap measure. (When cuboid has BItmap, it is large.)	4.0.0-ALPHA
kylin.query.spark-engine.join-memory-fraction	MEDIUM	Double	PROCESS	0.3	Limit memory used by broadcast join of Sparder. (Broadcast join cause unstable.)	4.0.0-ALPHA

File Name	Content	Comment
cubing_detect_items.json
sampling_detect_items.json
count_distinct.json
resource_paths.json

Space shortcuts

Page tree