Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
Directory tree structure in Kylin 4.0
- Root-dir
- PROJECT_NAME
- cube_statistics
- _sparder_log
- resources-jdbc
Kylin will generate temporary files in HDFS during the cube building; Besides, when purge/drop/merge cubes, some parquet files may be left in HDFS and will no longer be queried; Although Kylin has started to do some automated garbage collection, it might not cover all cases; You can do an offline storage cleanup periodically:
Which will be deleted:
- temp job files
hdfs:///kylin/${metadata_url}/${project}/job_tmp
- none used segment cuboid files
hdfs:///kylin/${metadata_url}/${project}/${cube_name}/${non_used_segment}
Usage:
1、 Check which resources can be cleanup, this will not remove anything:
export KYLIN_HOME=/path/to/kylin_home ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete false
2、You can pickup 1 or 2 resources to check whether they’re no longer be referred; Then add the “--delete true” option to start the cleanup:
${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete true