THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

Background

Kylin will generate temporary files in HDFS during the cube building; Besides, when purge/drop/merge cubes, some parquet files may be left in HDFS and will no longer be queried; Although Kylin has started to do some automated garbage collection, it might not cover all cases; You can do an offline storage cleanup periodically.

Directory tree structure under Kylin 4.0 's working dir


Working Dir

  • {PROJECT_NAME}
    • parquet [managed by tool]
      • {CUBE_NAME}
        • {SEGMENT_NAME}
          • {CUBOID_ID}
            • parquet files
    • spark_log
      • driver
        • {JOB_ID}
          • drivers' log of cubing job
      • executor
        • {JOB_ID}
          • executors' log of cubing job
    • dict/global_dict [managed by tool]
      • {CUBE_NAME}
        • {COLUMN_NAME}
          • dict files
    • table_snapshot [managed by tool]
      • {SCHEMA_NAME.TABLE_NAME}
        • {JOB_ID}
          • parquet files
    • job_tmp [managed by tool]
      • {JOB_ID}
        • TBD
  • cube_statistics
    • {CUBE_NAME}
      • {JOB_ID}
        • seq file of cuboid 's HLL
  • _sparder_log
    • {DATE}
      • executors 's log of query job
  • resources-jdbc
    • TBD


How to use

options
usage: org.apache.kylin.rest.job.StorageCleanupJob
 -cleanupGlobalDict <cleanupGlobalDict>         Boolean, whether or not to
                                                delete unreferenced global
                                                dict files. Default value
                                                is true .
 -cleanupJobTmp <cleanupJobTmp>                 Boolean, whether or not to
                                                delete job tmp files.
                                                Default value is false .
 -cleanupTableSnapshot <cleanupTableSnapshot>   Boolean, whether or not to
                                                delete unreferenced
                                                snapshot files. Default
                                                value is true .
 -cleanupThreshold <cleanupThreshold>           Integer, used to specific
                                                delete unreferenced
                                                storage that have not been
                                                modified before how many
                                                hours (recent files are
                                                protected). Default value
                                                is 168 hours.
 -delete <delete>                               Boolean, whether or not to
                                                do real delete operation.
                                                Default value is false,
                                                means a dry run.







  • No labels