Build Engine Configuration

Spark Job Option

Property	Required	Priority	Datatype	Default	Description	Version	Reference
kylin.engine.spark.build-class-name	no	low	String	org.apache.kylin.engine.spark.job.CubeBuildJob	For developer only. The className use in spark-submit.	4.0+
kylin.engine.spark.cluster-info-fetcher-class-name	no		String	org.apache.kylin.cluster.YarnInfoFetcher	Fetch yarn information of spark job
kylin.engine.spark-conf.XXX	no		String		Spark configurations want to override for build job like "spark.driver.cores". If don't set these spark properties, kylin will automaticly adjust these properties before submitting build job.	4.0+	Adaptively-adjust-spark-parameters
kylin.storage.provider	no		String	org.apache.kylin.common.storage.DefaultStorageProvider	不同的云厂商返回的 ContentSummary 对象不尽相同, 需要针对性地提供实现请参考 org.apache.kylin.common.storage.IStorageProvider
kylin.engine.spark.merge-class-name	no		String	org.apache.kylin.engine.spark.job.CubeMergeJob	For developer only. The className use in spark-submit
kylin.engine.spark.task-impact-instance-enabled	no		Boolean	true	Check kylin.engine.spark.task-core-factor. If kylin.engine.spark.task-impact-instance-enabled is set to true and kylin.engine.spark-conf.spark.executor.instances is not set, Kylin will calculate spark.executor.instances for Build Engine.	4.0+	Adaptively-adjust-spark-parameters
kylin.engine.spark.task-core-factor	no		Integer	3		4.0+	Adaptively-adjust-spark-parameters
kylin.engine.driver-memory-base	no		Integer	1024	Auto adujst *spark.driver.memory* for Build Engine if kylin.engine.spark-conf.spark.driver.memory is not set.	4.0+	Adaptively-adjust-spark-parameters
kylin.engine.driver-memory-strategy	no		Array	{"2", "20", "100"}
kylin.engine.driver-memory-maximum	no		Integer	4096
kylin.engine.persist-flattable-threshold	no		Integer	1	If the number of cuboids which will be build from flat table is bigger than this threshold, the flat table will be persisted into $HDFS_WORKING_DIR/job_tmp/flat_table for saving more memory.	4.0+
kylin.snapshot.parallel-build-timeout-seconds	no			3600	如果希望提升快照的构建速度的话, 可以设置这个. To improve the speed of snapshot build.	4.0+
kylin.snapshot.parallel-build-enabled	no		Boolean	true		4.0+

kylin.spark-conf.auto.prior	no		Boolean	true	是否需要自动设置一些 SparkConf. If need to adjust spark parameters adaptively.	4.0+	Adaptively-adjust-spark-parameters
kylin.engine.submit-hadoop-conf-dir	no		String	/etc/hadoop/conf	Set HADOOP_CONF_DIR for spark-submit.
kylin.storage.columnar.shard-size-mb	no		Integer	128	The size of each parquet partition file of cuboid	4.0+	ShardBy
ylin.storage.columnar.shard-rowcount	no		Long	2500000	The number rows of each parquet partition file of cuboid
kylin.storage.columnar.shard-countdistinct-rowcount	no		Long	1000000	The number rows of each parquet partition file of cuboid when the shard column is distinct column.
kylin.query.spark-engine.join-memory-fraction	no		Double	0.3	限制广播Join使用的内存, 这个名字是不是有问题, 为啥是 query 开头

Space shortcuts

Page tree

Spark Job Option