Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

First of all, I would like to share why Youzan chose to upgrade to Kylin 4. Here, I would like to briefly review let me briefly reviewed the history of Youzan OLAP infra.

In the early days of Youzan, in order to iterate develop process quickly, we chose the method of pre-computation + MySQL; in 2018, Druid was introduced because of query flexibility and development efficiency, but there were problems such as low pre-aggregation, not supporting precisely count distinct measure. In this contextsituation, Youzan introduced Apache Kylin and ClickHouse. Kylin supports high aggregation, precisely count distinct measure and the lowest RT, while ClickHouse is quite flexible in usage(ad hoc query).

From the introduction of Kylin in 2018 to now, Youzan has used Kylin for more than three years. With the continuous enrichment of business scenarios and the continuous accumulation of data volume, Youzan currently has 6 million existing merchants, GMV in 2020 is 107.3 billion, and the daily build data volume is 10 billion +. At present, Kylin has basically covered all the business scenarios of Youzan.

...

First of all, let's introduce the main advantages of Kylin 4. Apache Kylin 4 completely depends on Spark for cubing job and query. It can make full use of Spark's parallelization, quantization(向量化), and global dynamic code generation technologies to improve the efficiency of large queries.
Here is a brief introduction to the principle of Kylin 4, that is storage engine, build engine and query engine.

Storage engine

Image RemovedImage Added

First of all, let's take a look at the new storage engine, comparison between Kylin on HBase and Kylin on Parquet. The cuboid data of Kylin on HBase is stored in the table of HBase. Single Segment corresponds to one HBase table. Aggregation is pushed down to HBase coprocessor.

But as we know, HBase is not a real Columnar Storage and its throughput is not enough for OLAP System. Kylin 4 replaces HBase with Parquet, all the data is stored in files. Each segment will have a corresponding HDFS directory. All queries and cubing jobs read and write files without HBase . Although there will be a certain loss of performance for simple queries, the improvement brought about by complex queries is more considerable and worthwhile.

Build engine

The second is the new build engine. Based on our test, the build speed of Kylin on Parquet has been optimized from 82 minutes to 15 minutes. There are several reasons:

...

Kylin on Parquet changes the granularity of cubing to cuboid level, which is conducive to further improving parallelism of cubing job.
Enhanced implementation for global dictionary. In the new algorithm, dictionary and source data are hashed in to into the same buckets, making it possible for loading only piece of dictionary bucket to encode source data.

As you can see on the right, after upgradation to Kylin 4, cubing job changes from ten steps to two steps, the performance improvement of the construction is very obvious.

Query engine

Image RemovedImage Added

Next is the new query engine of Kylin 4. As you can see, the calculation of Kylin on HBase is completely dependent on the coprocessor of HBase and query server process. When the data is read from HBase into query server to do aggregation, sorting, etc, the bottleneck will be restricted by the single point of query server. But Kylin 4 is converted to a fully distributed query mechanism based on Spark ., what's more, it 's able to do configuration tuning automatically in spark query step !

How to optimize performance of Kylin 4

...

In Kylin4, SQL will be analyzed, optimized and do code generation in calcite, this . This step takes up about 150ms for some queries. We have supported PreparedStatementCache in Kylin4 to cache calcite plan, so that the structured SQL don't have to do the same step again. With this optimization it saved about 150ms of time cost.

Tunning spark configuration

Image RemovedImage Added

Kylin4 uses spark as query engine. As spark is a distributed engine designed for massive data processing, it's inevitable to loose some performance for small queries. We have tried to do some tuning to catch up with the latency in KYLIN3 Kylin on HBase for small queries.

Our first optimization is to make more processes finish in memory. The key is to avoid data spill during aggregation, shuffle and sort. Tuning the following configuration is helpful.

set "spark.sql.objectHashAggregate.sortBased.fallbackThreshold" to a bigger larger value to avoid HashAggregate fall back to Sort Based Aggregate, which really kills performance when happens.
set "spark.shuffle.spill.initialMemoryThreshold" to an a large " to value to avoid to many spills during shuffle.

...

Thirdly, we use RAM disk to enhance shuffle performance. Mount RAM disk as TMPFS and set spark.local.dir to directory using RAM disk.

Lastly, we disabled spark's whole stage code generation for small queries, for spark's whole stage code generation will cost about 100ms~200ms, whereas it's no need for not beneficial to small queries which is an a simple project.

Parquet optimization

...

The first principal is that we'd better always include shard by column in our filter condition, for parquet files are shared shard by shard-by columns-column, filter using shard by column reduces the data files to read.

...

Kylin build cube layer by layer. For a parent layer with multi cuboids to readbuild, we can choose to cache parent dataset by setting kylin.engine.spark.parent-dataset.max.persist.count to a number greater than 0. But notice that if you set this value too small, it will affect the parallelism of build job, as the build granularity is at cuboid level.

...

Performance of Kylin 4 on Youzan online system

Image Added

After the migration of metadata to Kylin4, let's share the qualitative changes and substantial performance improvements brought about by some of the promising scenarios. First of all, in a scenario like Commodity Insight, there is a large store of hundreds of thousands of commodities. We have to analyze its transactions and traffic, etc. There are more than a dozen precise precisely count distinct measures in single cube. Precisely count distinct measure is actually very inefficient if it is not optimized through pre-calculation and Bitmap. Kylin currently uses Bitmap to support precisely count distinct measure. In a scene that requires complex queries to sort hundreds of thousands of commodities in various UV(precisely count distinct measure), the RT of Kylin 2 is 27 seconds, while the RT of Kylin 4 is reduced from 27 seconds to less than 2 seconds.

...

Plan for Kylin 4 in Youzan

We have made full test, fixed several bugs and improved apache KYLIN4 for several months. Now we are migrating cubes from older version to newer version. . Please try to complete this part. Shengjun ZhengFor the cubes already migrated to KYLIN4, its small queries' performance meet our expectations, its complex query and build performance did bring us a big surprise. We are planning to migrate all cubes from older version to Kylin4.

Reference

Chinese version https://mp.weixin.qq.com/s/4aYhJoi--BMqme_0eQImTw

...

Space shortcuts

Page tree

Versions Compared

Old Version 14

New Version 15

Key

Storage engine

Build engine

Query engine

How to optimize performance of Kylin 4

Tunning spark configuration

Parquet optimization

Performance of Kylin 4 on Youzan online system

Plan for Kylin 4 in Youzan

Reference

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 14

New Version 15

Key

Storage engine

Build engine

Query engine

How to optimize performance of Kylin 4

Tunning spark configuration

Parquet optimization

Performance of Kylin 4 on Youzan online system

Plan for Kylin 4 in Youzan

Reference