THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Optimization of query engine

Cache Calcite physical plan

Image Added

In Kylin4, SQL will be analyzed, optimized and do code generation in calcite, this takes up about 150ms for some queries. We have supported PreparedStatementCache in Kylin4 to cache calcite plan. With this optimization it saved about 150ms of time cost

Tunning spark configuration

Image Added

Kylin4 uses spark as query engine. As spark is a distributed engine designed for massive data processing, it's inevitable to loose some performance for small queries. We have tried to do some tuning to catch up with the latency in KYLIN3 for small queries.

Our first optimization is to make more processes finish in memory. The key is to avoid data spill during aggregation, shuffle and sort. Tuning the following configuration is helpful.

  1. set "spark.sql.objectHashAggregate.sortBased.fallbackThreshold" to a bigger value to avoid HashAggregate fall back to Sort Based Aggregate, which really kills performance when happens.
  2. set "spark.shuffle.spill.initialMemoryThreshold to an large" to avoid to many spills during shuffle.

Secondly, we route small queries to Query Server which run spark in local mode. Because the overhead of task schedule, shuffle read and variable broadcast is enlarged for small queries on YARN/Standalone mode.

Thirdly, we use RAM disk to enhance shuffle performance. Mount RAM disk as TMPFS and set spark.local.dir to directory using RAM disk

Lastly, we disabled spark's whole stage code generation for small queries, for spark's whole stage code generation will cost about 100ms~200ms, whereas it's no need for small queries which is an simple project.

Parquet optimization

... Please try to complete this part. Shengjun Zheng

Dynamic elimination of partitioning dimensions

...

We have tested that in some situations the response time reduced from 20s to 6s, 10s to 3s.

Partition cropping under complex filtering conditions

对用户来说可能不需要了解分区过滤的细节,这应该是一个必须要有的功能  Shengjun Zheng

Cache Calcite physical plan

 Shengjun Zheng

Adjust spark configuration

Parquet optimization

... Please try to complete this part. Shengjun Zheng

Optimization of build engine

...