Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
Background
Kylin will generate temporary files in HDFS during the cube building; Besides, when purge/drop/merge cubes, some parquet files may be left in HDFS and will no longer be queried; Although Kylin has started to do some automated garbage collection, it might not cover all cases; You can do an offline storage cleanup periodically.
Directory tree structure under Kylin 4.0 's working dir
Working Dir
- {PROJECT_NAME}
- parquet [managed by tool]
- {CUBE_NAME}
- {SEGMENT_NAME}
- {CUBOID_ID}
- parquet files
- {CUBOID_ID}
- {SEGMENT_NAME}
- {CUBE_NAME}
- spark_log
- driver
- {JOB_ID}
- drivers' log of cubing job
- {JOB_ID}
- executor
- {JOB_ID}
- executors' log of cubing job
- {JOB_ID}
- driver
- dict/global_dict [managed by tool]
- {CUBE_NAME}
- {COLUMN_NAME}
- dict files
- {COLUMN_NAME}
- {CUBE_NAME}
- table_snapshot [managed by tool]
- {SCHEMA_NAME.TABLE_NAME}
- {JOB_ID}
- parquet files
- {JOB_ID}
- {SCHEMA_NAME.TABLE_NAME}
- job_tmp [managed by tool]
- {JOB_ID}
- TBD
- {JOB_ID}
- parquet [managed by tool]
- cube_statistics
- {CUBE_NAME}
- {JOB_ID}
- seq file of cuboid 's HLL
- {JOB_ID}
- {CUBE_NAME}
- _sparder_log
- {DATE}
- executors 's log of query job
- {DATE}
- resources-jdbc
- TBD
How to use
Overview
Content Tools
ThemeBuilder
Apps