Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
Spark Job Option
The content summary objects returned by different cloud vendors are not the same, so need to provide targeted implementation. 2500000Property Required Priority Datatype Default Description Version Reference kylin.engine.spark.build-class-name
no low String org.apache.kylin.engine.spark.job.CubeBuildJob
For developer only. The className use in spark-submit. 4.0+ kylin.engine.spark.cluster-info-fetcher-class-name
no String org.apache.kylin.cluster.YarnInfoFetcher
Fetch yarn information of spark job kylin.engine.spark-conf.XXX
no String Spark configurations want to override for build job like "spark.driver.cores". If don't set these spark properties, kylin will automaticly adjust these properties before submitting build job. 4.0+ Adaptively-adjust-spark-parameters kylin.storage.provider
no String org.apache.kylin.common.storage.DefaultStorageProvider
You can refer to this to learn more : org.apache.kylin.common.storage.IStorageProvider
kylin.engine.spark.merge-class-name
no String org.apache.kylin.engine.spark.job.CubeMergeJob
For developer only. The className use in spark-submit kylin.engine.spark.task-impact-instance-enabled
no Boolean true Check kylin.engine.spark.task-core-factor. If kylin.engine.spark.task-impact-instance-enabled is set to true and kylin.engine.spark-conf.spark.executor.instances is not set, Kylin will calculate spark.executor.instances for Build Engine. 4.0+ Adaptively-adjust-spark-parameters kylin.engine.spark.task-core-factor
no Integer 3 kylin.engine.driver-memory-base
no Integer 1024 Auto adujst spark.driver.memory for Build Engine if kylin.engine.spark-conf.spark.driver.memory is not set.
4.0+Adaptively-adjust-spark-parameters kylin.engine.driver-memory-strategy
no Array {"2", "20", "100"}
kylin.engine.driver-memory-maximum
no Integer 4096 kylin.engine.persist-flattable-threshold
no Integer 1 If the number of cuboids which will be build from flat table is bigger than this threshold, the flat table will be persisted into $HDFS_WORKING_DIR/job_tmp/flat_table for saving more memory. 4.0+ kylin.snapshot.parallel-build-timeout-seconds
no 3600
To improve the speed of snapshot build.
4.0+kylin.snapshot.parallel-build-enabled
no Boolean true kylin.spark-conf.auto.prior
no Boolean true If need to adjust spark parameters adaptively. 4.0+ Adaptively-adjust-spark-parameters kylin.engine.submit-hadoop-conf-dir
no String /etc/hadoop/conf Set HADOOP_CONF_DIR for spark-submit.
kylin.storage.columnar.shard-size-mb
no Integer 128 The size of each parquet partition file of cuboid
4.0+ShardBy ylin.storage.columnar.shard-rowcount
no Long The number rows of each parquet partition file of cuboid kylin.storage.columnar.shard-countdistinct-rowcount
no Long 1000000 The number rows of each parquet partition file of cuboid when the shard column is distinct column. kylin.query.spark-engine.join-memory-fraction
no Double 0.3 Limit memory used by broadcast join.