How to clean up storage in Kylin 4

Background

Kylin will generate temporary files in HDFS during the cube building; Besides, when purge/drop/merge cubes, some parquet files may be left in HDFS and will no longer be queried; Although Kylin has started to do some automated garbage collection, it might not cover all cases; You can do an offline storage cleanup periodically.

Directory tree structure under Kylin 4.0 's working dir

Working Dir

{PROJECT_NAME}
- parquet [managed by tool]
  - {CUBE_NAME}
    - {SEGMENT_NAME}
      - {CUBOID_ID}
        parquet files
- spark_log
  - driver
    - {JOB_ID}
      - drivers' log of cubing job
  - executor
    - {JOB_ID}
      - executors' log of cubing job
- dict/global_dict [managed by tool]
  - {CUBE_NAME}
    - {COLUMN_NAME}
      - dict files
- table_snapshot [managed by tool]
  - {SCHEMA_NAME.TABLE_NAME}
    - {JOB_ID}
      - parquet files
- job_tmp [managed by tool]
  - {JOB_ID}
    - TBD
cube_statistics
- {CUBE_NAME}
  - {JOB_ID}
    - seq file of cuboid 's HLL
_sparder_log
- {DATE}
  - executors 's log of query job
resources-jdbc
- TBD

Space shortcuts

Page tree

Background

Directory tree structure under Kylin 4.0 's working dir

How to use