THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

Spark Job Option


PropertyRequiredPriorityDatatypeDefaultDescriptionVersionReference
kylin.engine.spark.build-class-name
nolowString
org.apache.kylin.engine.spark.job.CubeBuildJob
For developer only. The className use in spark-submit.4.0+
kylin.engine.spark.cluster-info-fetcher-class-name
no
String
org.apache.kylin.cluster.YarnInfoFetcher
Fetch yarn information of spark job

kylin.engine.spark-conf.XXX
no
String
Spark configurations want to override for build job like "spark.driver.cores". If don't set these spark properties, kylin will automaticly adjust these properties before submitting build job. 4.0+Adaptively-adjust-spark-parameters
kylin.storage.provider
no
String
org.apache.kylin.common.storage.DefaultStorageProvider

The content summary objects returned by different cloud vendors are not the same, so need to provide targeted implementation.

You can refer to this to learn more : org.apache.kylin.common.storage.IStorageProvider


kylin.engine.spark.merge-class-name
no
String
org.apache.kylin.engine.spark.job.CubeMergeJob
For developer only. The className use in spark-submit

kylin.engine.spark.task-impact-instance-enabled
no
BooleantrueCheck kylin.engine.spark.task-core-factor. If kylin.engine.spark.task-impact-instance-enabled is set to true and kylin.engine.spark-conf.spark.executor.instances is not set, Kylin will calculate spark.executor.instances for Build Engine.4.0+Adaptively-adjust-spark-parameters

kylin.engine.spark.task-core-factor
no
Integer3
kylin.engine.driver-memory-base
no
Integer1024Auto adujst spark.driver.memory for Build Engine if kylin.engine.spark-conf.spark.driver.memory is not set.



4.0+
Adaptively-adjust-spark-parameters
kylin.engine.driver-memory-strategy
no
Array
{"2", "20", "100"}
kylin.engine.driver-memory-maximum
no
Integer4096
kylin.engine.persist-flattable-threshold
no
Integer1If the number of cuboids which will be build from flat table is bigger than this threshold, the flat table will be persisted into $HDFS_WORKING_DIR/job_tmp/flat_table for saving more memory.4.0+
kylin.snapshot.parallel-build-timeout-seconds
no

3600
To improve the speed of snapshot build.

4.0+

kylin.snapshot.parallel-build-enabled
no
Booleantrue








kylin.spark-conf.auto.prior
no
Booleantrue If need to adjust spark parameters adaptively.4.0+Adaptively-adjust-spark-parameters
kylin.engine.submit-hadoop-conf-dir
no
String/etc/hadoop/conf
Set HADOOP_CONF_DIR for spark-submit.


kylin.storage.columnar.shard-size-mb
no
Integer128The size of each parquet partition file of cuboid

4.0+
ShardBy
ylin.storage.columnar.shard-rowcount
no
Long

2500000

The number rows of each parquet partition file of cuboid
kylin.storage.columnar.shard-countdistinct-rowcount
no
Long1000000The number rows of each parquet partition file of cuboid when the shard column is distinct column.
kylin.query.spark-engine.join-memory-fraction
no
Double0.3Limit memory used by broadcast join.

  • No labels