THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

System cube is a set of cubes created by kylin Kylin for better self-monitoring, which is supported from kylinKylin-2.3.0.

In kylin3Kylin 3.x and kylin2Kylin 2.x, the built segment data is stored in HBase, so the query metrics collected by the system cube are basically related to HBase RPC; while kylin4 Kylin 4 implements a new build and query engine, HBase storage has been replaced by the new parquet storage, and the original metrics no longer exist in kylin4Kylin 4.

In order to make the system cube work normally in kylin4 and help users monitor the build and query, we need to refine the new system cube query metrics, and the structure of the three query-related hive tables that the corresponding system cube depends on will also change.

After the system cube is enabled, every query or build operation in kylin Kylin will be recorded in the hive table. There are five hive tables, which correspond to the fact tables of the five system cubes:

Hive Table Name

Description

System Cube Name

hive_metrics_query_execution_qa

Collect query level and spark execution level related metrics

KYLIN_HIVE_METRICS_QUERY_EXECUTION_QA

hive_metrics_query_spark_job_qa

Collect query spark job level related information

KYLIN_HIVE_METRICS_QUERY_SPARK_JOB_QA

hive_metrics_query_spark_stage_qa

Collect query spark stage level information

KYLIN_HIVE_METRICS_QUERY_SPARK_STAGE_QA

hive_metrics_job_qa

Collect job-related metrics

KYLIN_HIVE_METRICS_JOB_QA

hive_metrics_job_exception_qa

Collect job-related metrics

KYLIN_HIVE_METRICS_JOB_EXCEPTION_QA

2. Configuration

By default, the system cube is disabled. To enable system cube, you need to make the following configuration:

kylin.metrics.monitor-enabled=true
kylin.metrics.reporter-query-enabled=true
kylin.metrics.reporter-job-enabled=true

Generally, the system cube is used together with the Dashboard. You can do the following configuration to open the Dashboard:

kylin.web.dashboard-enabled=true

When kylin4 Kylin 4 collects query-related metrics, it will temporarily save each query-related metrics record as a piece of data in the memory cache. When the storage time of the records in the cache exceeds the expiration time or the total number of records exceeds the maximum capacity, the records that need to be removed from the cache will be packaged in a certain format and passed to the metrics system. The expiration time and maximum capacity are determined by the following configurations. Their default values are 300 (seconds) and 10000 (pieces). You can use the find them in the "conf/kylin.properties" file. Modify their values in:

kylin.metrics.query-cache.expire-seconds=300
kylin.metrics.query-cache.max-entries=10000

The records in the metrics system will be saved to HDFS after a certain period of time or a certain amount of time. Here, the default time is 10 (minutes), and the default number is 10. You can modify these two values by modifying the configuration item in the configuration file $KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/classes/kylinMetrics.xml, the configuration item of "index = 1" indicates how many pieces of data will be inserted into "hive", and the configuration item of "index = 2" indicates how long it will be inserted into "hive", in minutes:

3. Create the system cube

Before using the system cube, you need to prepare the hive table and cube mentioned in the above table. You can choose to create the system cube manually or create automatically using system-cube.sh.

Anchor
Manual
Manual
3.2 Create system cube manually

Step1 Prepare the configuration file

Create a configuration file SCSinkTools.json in the $KYLIN_HOME directory. For example:

...

Code Block
languagebash
title生成MetadataGenerate Metadata
collapsetrue
./bin/kylin.sh org.apache.kylin.tool.metrics.systemcube.SCCreator \
-inputConfig SCSinkTools.json \
-output <output_forder>

...

By default, these tables will be created in the database named "kylin" in hive,  and and the default value kylin can be modified through the configuration item item "kylin.metrics.prefix".

Step4 Restore metadata

Then we need to restore system cube metadata to kylin Kylin metastore through the following command:

...

Finally, Reload metadata in kylin Kylin Web UI , and you can see a group of system cubes appear in the project named KYLIN_SYSTEM.

...

After creating system cubes, as the metrics information is written into hive, these cubes need to be built regularly so that the metrics information written into hive can be quickly queried in kylinKylin.

You can use the following methods to build the system cube on a regular basis:

...

  • Create System Cube:sh system-cube.sh setup

  • Build System Cube:sh bin/system-cube.sh build

  • Add scheduled build jobs to the system cube:bin/system.sh cron

...