Build Engine Configuration

Property	Required	Priority	Datatype	Configuration Level	Default	Description	Version	Reference
kylin.engine.spark.build-class-name	NO	MINOR	String	PROCESS	org.apache.kylin.engine.spark.job.CubeBuildJob	For developer only. The className use in spark-submit.	4.0.0-ALPHA
kylin.engine.spark.cluster-info-fetcher-class-name	NO	MINOR	String	PROCESS	org.apache.kylin.cluster.YarnInfoFetcher	For developer only. Fetch yarn information of spark job	4.0.0-ALPHA
kylin.engine.spark-conf.XXX	NO	MINOR	String	PROCESS	Null	Spark configurations want to override for build job like "spark.driver.cores". If don't set these spark properties, kylin will automaticly adjust these properties before submitting build job.	4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.storage.provider	NO	MINOR	String	PROCESS	org.apache.kylin.common.storage.DefaultStorageProvider	The content summary objects returned by different cloud vendors are not the same, so need to provide targeted implementation. You can refer to this to learn more : org.apache.kylin.common.storage.IStorageProvider	4.0.0-ALPHA
kylin.engine.spark.merge-class-name	NO	MINOR	String	PROCESS	org.apache.kylin.engine.spark.job.CubeMergeJob	For developer only. The className use in spark-submit	4.0.0-ALPHA
kylin.engine.spark.task-impact-instance-enabled	NO		Boolean	PROCESS	true	Check kylin.engine.spark.task-core-factor. If kylin.engine.spark.task-impact-instance-enabled is set to true and kylin.engine.spark-conf.spark.executor.instances is not set, Kylin will calculate spark.executor.instances for Build Engine.	4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.engine.spark.task-core-factor	NO		Integer	PROCESS	3		4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.engine.driver-memory-base	YES		Integer	PROCESS	1024	Auto adujst *spark.driver.memory* for Build Engine if kylin.engine.spark-conf.spark.driver.memory is not set.	4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.engine.driver-memory-strategy	YES		Array	PROCESS	{"2", "20", "100"}
kylin.engine.driver-memory-maximum	YES		Integer	PROCESS	4096
kylin.engine.persist-flattable-threshold	NO		Integer	PROCESS	1	If the number of cuboids which will be build from flat table is bigger than this threshold, the flat table will be persisted into $HDFS_WORKING_DIR/job_tmp/flat_table for saving more memory.	4.0.0-ALPHA
kylin.snapshot.parallel-build-timeout-seconds	NO	MAJOR		PROCESS	3600	To improve the speed of snapshot build.	4.0.0-ALPHA
kylin.snapshot.parallel-build-enabled	NO	MAJOR	Boolean	PROCESS	true	To improve the speed of snapshot build.	4.0.0-ALPHA
				PROCESS
kylin.spark-conf.auto.prior	NO	MINOR	Boolean	PROCESS	true	If need to adjust spark parameters adaptively.	4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.engine.submit-hadoop-conf-dir	YES	MAJOR	String	PROCESS	/etc/hadoop/conf	Set HADOOP_CONF_DIR for spark-submit.	4.0.0-ALPHA
kylin.storage.columnar.shard-size-mb	YES	MAJOR	Integer	CUBE	128	The size of each parquet partition file of cuboid	4.0.0-ALPHA	ShardBy
kylin.storage.columnar.shard-rowcount	YES	MAJOR	Long	CUBE	2500000	The max rows of each parquet partition file of cuboid
kylin.storage.columnar.shard-countdistinct-rowcount	YES	MAJOR	Long	CUBE	1000000	The number rows of each parquet partition file of cuboid when the shard column is distinct column.
kylin.query.spark-engine.join-memory-fraction	NO		Double	PROCESS	0.3	Limit memory used by broadcast join.	4.0.0-ALPHA

Space shortcuts

Page tree