THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

By setting shard by column, we can improve the cocurrency of query and improve query effeciency.

We suggest set the column as shard by column which have a high cardinality, so that build engine will repartition the cuboid with shard by column and filter out files not in range.

Attention

Before configure shard by column, there are some things need to pay attention to:

  • Now only support one shard by column, so we suggest set the column wich have high cardinality
  • The shard by column must not be set to derived column. See more about derived column.

How to configure shard by column

For example, there's a column which have high cardinality called seller_id and our application scenario will filter according to this column. There're some sample SQLs:

select count from kylin_sales left join kylin_order where seller_id = '10000233'
select count from kylin_sales left join kylin_order where SELLER_ID in (10000233,10000234,10000235)
select count from kylin_sales left join kylin_order where SELLER_ID is NULL
select count from kylin_sales left join kylin_order where SELLER_ID in (10000233,10000234,10000235) and SELLER_ID = 10000233 
select count from kylin_sales left join kylin_order where SELLER_ID = 10000233 or SELLER_ID = 1 


Step 1

Edit cube and add dimension seller_id. Remember that the type of dimension should be normal not derived.

Step 2

From Cube Designer → Advanced Setting → Rowkeys, find the column seller_id and set the shard by to true. Remember that now only support one shard by column, so there should only be one shard by column set to true.


Step 3

Build the cube. 


Storage


  • No labels