THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

Directory tree structure in Kylin 4.0


  • Root-dir
    • PROJECT_NAME
      • parquet
        • {CUBE_NAME}
          • segment_name
      • spark_log
        • executors' log of cubing job
      • dict/global_dict
        • {CUBE_NAME}
          • {COLUMN_NAME}
      • table_snapshot
        • table_name
          • job_id
      • job_tmp
    • cube_statistics
      • {CUBE_NAME}
        • {JOB_ID}
          • seq file of cuboid 's HLL
    • _sparder_log
      • {DATE}
        • executors 's log of query job
    • resources-jdbc




Kylin will generate temporary files in HDFS during the cube building; Besides, when purge/drop/merge cubes, some parquet files may be left in HDFS and will no longer be queried; Although Kylin has started to do some automated garbage collection, it might not cover all cases; You can do an offline storage cleanup periodically:

Which will be deleted:

  • temp job files

            hdfs:///kylin/${metadata_url}/${project}/job_tmp

  • none used segment cuboid files

            hdfs:///kylin/${metadata_url}/${project}/${cube_name}/${non_used_segment}            

Usage:

1、 Check which resources can be cleanup, this will not remove anything:            

export KYLIN_HOME=/path/to/kylin_home
${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete false


2、You can pickup 1 or 2 resources to check whether they’re no longer be referred; Then add the “--delete true” option to start the cleanup:

${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete true
  • No labels