Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
By setting shard by column, we can improve the cocurrency of query and improve query effeciency.
We suggest set the column as shard by column which have a high cardinality, so that build engine will repartition the cuboid with shard by column and filter out files not in range.
Attention
Before configure shard by column, there are some things need to pay attention to:
- Now only support one shard by column, so we suggest set the column wich have high cardinality
- The shard by column must not be set to derived column. See more about derived column.
How to configure shard by column
For example, there's a column which have high cardinality called seller_id and our application scenario will filter according to this column. There're some sample SQLs:
select count(*) from kylin_sales left join kylin_order where seller_id = '10000233' select count(*) from kylin_sales left join kylin_order where SELLER_ID in (10000233,10000234,10000235) select count(*) from kylin_sales left join kylin_order where SELLER_ID is NULL select count(*) from kylin_sales left join kylin_order where SELLER_ID in (10000233,10000234,10000235) and SELLER_ID = 10000233 select count(*) from kylin_sales left join kylin_order where SELLER_ID = 10000233 or SELLER_ID = 1
Step 1
Edit cube and add dimension seller_id. Remember that the type of dimension should be normal not derived.
Step 2
From Cube Designer → Advanced Setting → Rowkeys, find the column seller_id and set the shard by to true. Remember that now only support one shard by column, so there should only be one shard by column set to true.
Step 3
Build the cube.