THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

Basic configuration

PropertyDefaultDescSince
kylin.snapshot.parallel-build-enabled



kylin.snapshot.parallel-build-timeout-seconds



kylin.snapshot.shard-size-mb



kylin.storage.columnar.shard-size-mb



kylin.storage.columnar.shard-rowcount



kylin.storage.columnar.shard-countdistinct-rowcount



kylin.storage.columnar.repartition-threshold-size-mb



kylin.engine.submit-hadoop-conf-dir




Advanced configuration

PropertyDefaultDescSince
kylin.engine.spark.cache-parent-dataset-storage-level
NONE
4.0.0
kylin.engine.spark.cache-parent-dataset-count
1
4.0.0
kylin.engine.build-base-cuboid-enabled
true
4.0.0

Spark resources automatic adjustment strategy

PropertyDefaultDescSince
kylin.spark-conf.auto.prior
trueFor a CubeBuildJob and CubeMergeJob, it is important to allocate enough and proper resources(cpu/memory), including following config entries mainly:
- spark.driver.memory
- spark.executor.memory
- spark.executor.cores
- spark.executor.memoryOverhead
- spark.executor.instances
- spark.sql.shuffle.partitions

When `kylin.spark-conf.auto.prior` is set to true, Kylin will try to adjust above config entries according to:
- Count of cuboids to be built
- (Max)Size of fact table
- Available resources from current resource manager 's queue

But user still can choose to override some config via `kylin.engine.spark-conf.XXX` in Cube level .
Check detail at How to improve cube building and query performance
4.0.0
kylin.engine.spark-conf.
nullUser can choose to set spark conf at Cube level.4.0.0
kylin.engine.driver-memory-base



kylin.engine.driver-memory-maximum



kylin.engine.driver-memory-strategy



kylin.engine.base-executor-instance



kylin.engine.spark.required-cores



kylin.engine.executor-instance-strategy



kylin.engine.retry-memory-gradient



Global dictionary



Data shew



Please remove following : 

PropertyDefaultDescriptionVersion
kylin.engine.spark.build-class-name
org.apache.kylin.engine.spark.job.CubeBuildJob
For developer only. The className use in spark-submit

4.0.0-ALPHA

kylin.engine.spark.cluster-info-fetcher-class-name
org.apache.kylin.cluster.YarnInfoFetcher
For developer only. Fetch yarn information of spark job

4.0.0-ALPHA

kylin.engine.spark-conf.XXX

  1. Before Kylin submit a cubing job, some major property(cores and memory) will be automatically adjusted adaptively. (if kylin.spark-conf.auto.prior was set to true).
  2. After auto adjust, spark conf will be overwrite by this property. If you want to set spark.driver.extraJavaOptions=-Dhdp.version=current, you can add follow line in kylin.properties:
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current

4.0.0-ALPHA

kylin.storage.provider
org.apache.kylin.common.storage.DefaultStorageProvider

The content summary objects returned by different cloud vendors are not the same, so need to provide targeted implementation.

You can refer to this to learn more : org.apache.kylin.common.storage.IStorageProvider

4.0.0-ALPHA

kylin.engine.spark.merge-class-name
org.apache.kylin.engine.spark.job.CubeMergeJob
For developer only. The className use in spark-submit

4.0.0-ALPHA

kylin.engine.spark.task-impact-instance-enabled
true

UPDATING

4.0.0-ALPHA

kylin.engine.spark.task-core-factor
3

UPDATING

4.0.0-ALPHA

kylin.engine.driver-memory-base
1024Auto adujst spark.driver.memory for Build Engine if kylin.engine.spark-conf.spark.driver.memory is not set.



4.0.0-ALPHA

kylin.engine.driver-memory-strategy
{"2", "20", "100"}
UPDATING

4.0.0-ALPHA

kylin.engine.driver-memory-maximum
4096

UPDATING

4.0.0-ALPHA

kylin.engine.persist-flattable-threshold
1If the number of cuboids which will be build from flat table is bigger than this threshold, the flat table will be persisted into $HDFS_WORKING_DIR/job_tmp/flat_table for saving more memory.

4.0.0-ALPHA

kylin.snapshot.parallel-build-timeout-seconds
3600
To improve the speed of snapshot build.


4.0.0-ALPHA

kylin.snapshot.parallel-build-enabled
true

UPDATING






kylin.spark-conf.auto.prior
true Enable adjust spark parameters adaptively.

4.0.0-ALPHA

kylin.engine.submit-hadoop-conf-dir
/etc/hadoop/conf

Set HADOOP_CONF_DIR for spark-submit.

4.0.0-ALPHA

kylin.storage.columnar.shard-size-mb
128

The max size of pre-calcualted cuboid parquet file.

4.0.0-ALPHA

kylin.storage.columnar.shard-rowcount

2500000

The max rows of pre-calcualted cuboid parquet file.

4.0.0-ALPHA

kylin.storage.columnar.shard-countdistinct-rowcount
1000000The max rows of pre-calcualted cuboid parquet file when cuboid has bitmap measure. (When cuboid has BItmap, it is large.)

4.0.0-ALPHA

kylin.query.spark-engine.join-memory-fraction
0.3Limit memory used by broadcast join of Sparder. (Broadcast join cause unstable.)

4.0.0-ALPHA

  • No labels