Build Engine Configuration

Property	Required	Priority	Datatype	Default	Description	Version	Reference
kylin.engine.spark.build-class-name	NO	MINOR	String	org.apache.kylin.engine.spark.job.CubeBuildJob	For developer only. The className use in spark-submit.	4.0.0-ALPHA
kylin.engine.spark.cluster-info-fetcher-class-name	NO	MINOR	String	org.apache.kylin.cluster.YarnInfoFetcher	Fetch yarn information of spark job	4.0.0-ALPHA
kylin.engine.spark-conf.XXX	NO	MINOR	String		Spark configurations want to override for build job like "spark.driver.cores". If don't set these spark properties, kylin will automaticly adjust these properties before submitting build job.	4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.storage.provider	NO	MINOR	String	org.apache.kylin.common.storage.DefaultStorageProvider	The content summary objects returned by different cloud vendors are not the same, so need to provide targeted implementation. You can refer to this to learn more : org.apache.kylin.common.storage.IStorageProvider	4.0.0-ALPHA
kylin.engine.spark.merge-class-name	NO	MINOR	String	org.apache.kylin.engine.spark.job.CubeMergeJob	For developer only. The className use in spark-submit	4.0.0-ALPHA
kylin.engine.spark.task-impact-instance-enabled	NO		Boolean	true	Check kylin.engine.spark.task-core-factor. If kylin.engine.spark.task-impact-instance-enabled is set to true and kylin.engine.spark-conf.spark.executor.instances is not set, Kylin will calculate spark.executor.instances for Build Engine.	4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.engine.spark.task-core-factor	NO		Integer	3		4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.engine.driver-memory-base	YES		Integer	1024	Auto adujst *spark.driver.memory* for Build Engine if kylin.engine.spark-conf.spark.driver.memory is not set.	4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.engine.driver-memory-strategy	YES		Array	{"2", "20", "100"}
kylin.engine.driver-memory-maximum	YES		Integer	4096
kylin.engine.persist-flattable-threshold	NO		Integer	1	If the number of cuboids which will be build from flat table is bigger than this threshold, the flat table will be persisted into $HDFS_WORKING_DIR/job_tmp/flat_table for saving more memory.	4.0.0-ALPHA
kylin.snapshot.parallel-build-timeout-seconds	NO	MAJOR		3600	To improve the speed of snapshot build.	4.0.0-ALPHA
kylin.snapshot.parallel-build-enabled	NO	MAJOR	Boolean	true	To improve the speed of snapshot build.	4.0.0-ALPHA

kylin.spark-conf.auto.prior	NO	MINOR	Boolean	true	If need to adjust spark parameters adaptively.	4.0.0-ALPHA	Adaptively-adjust-spark-parameters
kylin.engine.submit-hadoop-conf-dir	YES	MAJOR	String	/etc/hadoop/conf	Set HADOOP_CONF_DIR for spark-submit.	4.0.0-ALPHA
kylin.storage.columnar.shard-size-mb	YES	MAJOR	Integer	128	The size of each parquet partition file of cuboid	4.0.0-ALPHA	ShardBy
kylin.storage.columnar.shard-rowcount	YES	MAJOR	Long	2500000	The number rows of each parquet partition file of cuboid
kylin.storage.columnar.shard-countdistinct-rowcount	YES	MAJOR	Long	1000000	The number rows of each parquet partition file of cuboid when the shard column is distinct column.
kylin.query.spark-engine.join-memory-fraction	NO		Double	0.3	Limit memory used by broadcast join.	4.0.0-ALPHA

Space shortcuts

Page tree