Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

Table of Contents

maxLevel	4

Part I . Why Kylin on Parquet

Benifits

Compare to kylin architechture, the main changes include the following:

...

Cuboids are saved into HDFS as parquet format(or other file system, no longer need HBase)

Storage Engine

Currently, Kylin uses Apache HBase as the storage. HBase Storage is very fast, while it also has some drawbacks:

...

Benchmark Report for Parquet Storage

Query Engine

Kylin 3.X or lower version

Query node calculate pressure, single bottleneck
Hard to debug the code generated by Calcite

Kylin 4.X

Fully Distributeddistributed
Easy to debug and add breakpoint in each DataFrame

...

Part II . How Kylin on Parquet

Code design diagram and analysis

...

Columns[id, name, age] correspond to Dimension[2, 1, 0], measures[COUNT, SUM] correspond to [3, 4]

Query optimize

FilePruner

Effection

Prune segment with partition column(Date type)
Prune cuboid parquet files with shard by columns

How to use

Prune with paritition column will auto analyse date range to prune segments
Prune shard columns
1. Identify the columns that need shard by. It's usually the column that used after where. For example: "select count from kylin_sales left join kylin_order where seller_id = '100041'", the "shard by" column is seller_id.
2. Edit cube. The shard by column should set as normal column not derived column.
3. Set "Shard by" to true in "Cube edit" -> "Advanced Setting" -> "Rowkey"
4. Set "kylin.storage.columnar.shard-rowcount" in kylin.properties, the default value is 2500000. The property is used to cut the cuboid file into multiple files and then filter out unwanted files when query.

Limit:

As for now, the shard by is set by cube leve, so there should only be one shard by column. In the future, we may support multi shard by columns with cuboid level. And community users can also give more suggestions.

Space shortcuts

Page tree

Versions Compared

Old Version 9

New Version 10

Key

Part I . Why Kylin on Parquet

Benifits

Storage Engine

Query Engine

Kylin 3.X or lower version

Kylin 4.X

Part II . How Kylin on Parquet

Code design diagram and analysis

Query optimize

FilePruner

Effection

How to use

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 9

New Version 10

Key

Part I . Why Kylin on Parquet

Benifits

Storage Engine

Query Engine

Kylin 3.X or lower version

Kylin 4.X

Part II . How Kylin on Parquet

Code design diagram and analysis

Query optimize

FilePruner

Effection

How to use