Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
Part I What is Hive Global Dictionary
Backgroud
Benefit
- Build Global Dictionary in distributed way.
- Job Server will do less job, thus be more stable.
- One ID, you can reuse the dictionary in whole ETL pipeline across the company.
Part II How to use
Configuration
Conf key | Explanation | Example |
---|---|---|
kylin.dictionary.mr-hive.database | Which database the Hive Global Dictionary in | default |
kylin.dictionary.mr-hive.columns | A list, contain all columns which need a Hive Global Dictionary, in a {CUBE_NAME}_{COLUMN_NAME} | KYLIN_SALES_SALES_ID,KYLIN_SALES_BUYER_ID |
kylin.dictionary.mr-hive.table.suffix | Suffix for Segment Dictionary Table and Global Dictionary Table | _dict_table |
kylin.dictionary.mr-hive.intermediate.table.suffix | Suffix for Distinct Value Table | _distinct_value |
kylin.dictionary.mr-hive.columns.reduce.num | A key/value structure, which the key is {CUBE_NAME}_{COLUMN_NAME}, and value is number for expected reducers. | KYLIN_SALES_SALES_ID:3,KYLIN_SALES_BUYER_ID:2 |
kylin.source.hive.databasedir | Where can Kylin find file for hive table | /user/hive/warehouse/lacus.db |
kylin.dictionary.mr-hive.ref.columns | To reuse another global dictionary(s), you can specific a list here, to refer to some existent global dictionary built by another cube | KYLIN_SALES_SALES_ID,KYLIN_SALES_BUYER_ID |
Hive Table
Table | Name Pattern | Explanation |
---|---|---|
Distinct Value Table | ${FLAT_TABLE}_${kylin.dictionary.mr-hive.intermediate.table.suffix} | |
Segment Dictionary Table | ${FLAT_TABLE}_${kylin.dictionary.mr-hive.table.suffix} | |
Global Dictionary Table | ${CUBE_NAME}_${kylin.dictionary.mr-hive.table.suffix} |
New added steps
Serial No | Step Name | Explanation | |
---|---|---|---|
1 | Create hive dictionary table | ||
2 | Extract distinct value into Distinct Value Table | ||
3 | Build Segment Level Dictionary (MR job-1) | ||
4 | Build Segment Level Dictionary (MR job-2) | ||
5 | Merge Segment Level Dictionary into Global Dictionary Table | ||
6 | Replace/encode Flat Table | ||
7 | Cleanup temp table & data |
Part III Performance Comparison
Overview
Content Tools
ThemeBuilder
Apps