THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

1. Background

System cube 是 kylin 为了更好地进行自我监控所创建的一组 cube,从 is a set of cubes created by kylin for better self-monitoring, which is supported from kylin-2.3.0 版本开始支持.

In kylin3.x and kylin2.x 中,构建好的 cube 数据存储在 HBase 中,所以system cube 所收集的 query metrics 基本上都是与 HBase rpc 相关的 metrics;而 Kylin4 中实现了新的构建和查询引擎,HBase 存储也已经被新的 Parquet 存储所代替,原有的 metrics 在 kylin4 中已经不存在。

为了使 system cube 能够在 kylin4 中正常工作,帮助用户监控构建和查询情况,需要重新定义新的 system cube query metrics ,相应的 system cube 所依赖的三张 query 相关的 hive 表结构也会发生改变。

, the built segment data is stored in HBase, so the query metrics collected by system cube are basically related to HBase RPC; while kylin4 implements a new build and query engine, HBase storage has been replaced by the new parquet storage, and the original metrics no longer exist in kylin4.

In order to make the system cube work normally in kylin4 and help users monitor the build and query, we need to refine the new system cube query metrics, and the structure of the three query related hive tables that the corresponding system cube depends on will also change.

After system cube is enabled, every query or build operation in kylin will be recorded in the hive table. There are five hive tables, which correspond to the fact tables of the five system cubes:开启 system cube之后,用户在 Kylin 中的每一个查询或构建操作,都会被记录在 Hive 表中,共有 5  Hive 表,它们分别对应了  System Cube 的事实表:

Hive Table Name

Description

System Cube Name

hive_metrics_query_execution_qa

Collect query level and spark execution level related metrics

KYLIN_HIVE_METRICS_QUERY_EXECUTION_QA

hive_metrics_query_spark_job_qa

Collect query spark job level related information

KYLIN_HIVE_METRICS_QUERY_SPARK_JOB_QA

hive_metrics_query_spark_stage_qa

Collect query spark stage level information

KYLIN_HIVE_METRICS_QUERY_SPARK_STAGE_QA

hive_metrics_job_qa

Collect job related metrics

KYLIN_HIVE_METRICS_JOB_QA

hive_metrics_job_exception_qa

Collect job related metrics

KYLIN_HIVE_METRICS_JOB_EXCEPTION_QA

2. Configuration

...

By default, system cube is disabled. To enable system cube, you need to make the following configuration:

kylin.metrics.monitor-enabled=true
kylin.metrics.reporter-query-enabled=true
kylin.metrics.reporter-job-enabled=true

通常情况下,system cube 配合 Dashboard 功能一起使,你可以做以下配置来开启 Dashboard:Generally, system cube is used together with the Dashboard. You can do the following configuration to open the Dashboard:

kylin.web.dashboard-enabled=true

When kylin4 在收集查询相关的 metrics 时,会将每条 query 相关的 metrics 记录作为一条数据暂时保存在内存的 cache 中,当 cache 中的记录保存时间超过 过期时间 或者总记录数超过 最大容量 时,需要从cache移除的记录会被包装成一定的格式传递到 MetricsSystem 中,这里的 过期时长最大容量 是通过以下配置决定的,它们的默认值为 300(秒) 和 10000(条),你可以在 kylin.properties 中修改他们的值:collects query related metrics, it will temporarily save each query related metrics record as a piece of data in the memory cache. When the storage time of the records in the cache exceeds the expiration time or the total number of records exceeds the maximum capacity, the records that need to be removed from the cache will be packaged in a certain format and passed to the metrics system. The expiration time and maximum capacity are determined by the following configurations. Their default values are 300 (seconds) and 10000 (pieces). You can use the kylin.properties Modify their values in:

kylin.metrics.query-cache.expire-seconds=300
kylin.metrics.query-cache.max-entries=10000

在 MetricsSystem 中的记录会再经过 一定时间 或者到达 一定数量 ,才会被保存到 hdfs。这里的 一定时间 默认为 10(分钟),一定数量 默认为 10 条。你可以通过修改配置文件 The records in metrics system will be saved to HDFS after a certain period of time or a certain amount of time. Here, the default time is 10 (minutes), and the default number is 10. You can modify these two values by modifying the configuration item in configuration file $KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/classes/kylinMetrics.xml, 中的配置项来修改这两个值,“the configuration item of "index = 1” 的配置项表示累计多少条数据必然会插入 Hive,“index=2” 的配置项表示累计多长时间必然会插入 Hive,单位为分钟):" indicates how many pieces of data will be inserted into "hive", and the configuration item of "index = 2" indicates how long it will be inserted into "hive", in minutes:

3.

...

Create system cube

Before using the system cube, you need to prepare the hive table and cube mentioned in the above table. You can choose to create system cube manually or create automatically using 使用 system cube 之前,需要准备上面表格中提到的 Hive 表和 Cube,你可以选择 手动创建 或者 使用 system-cube.sh 脚本自动创建.

Anchor
Manual
Manual
3.2

...

Create system cube manually

Step1、Prepare configuration file

Create a configuration file SCSinkTools.json in the $KYLIN_HOME directory. For example:

Step1、准备配置文件

在 KYLIN_HOME 目录下创建一个配置文件 SCSinkTools.json。例如:

Code Block
titleSCSinkTools.json
linenumberstrue
collapsetrue
[
    {
       "sink": "hive",
       "storage_type": 4,
       "cube_desc_override_properties": {
         "kylin.cube.max-building-segments": "1"
       }
    }
]

...

Step2、Generate metadata

在 KYLIN_HOME 文件夹下运行以下命令生成相关的 metadata:Run the following command in $KYLIN_HOME directory to generate related metadata:

Code Block
languagebash
title生成Metadata
collapsetrue
./bin/kylin.sh org.apache.kylin.tool.metrics.systemcube.SCCreator \
-inputConfig SCSinkTools.json \
-output <output_forder>

通过这个命令,相关的 metadata 将会生成且其位置位于 <output_forder> 下。细节如下,system_cube 就是我们的 With this command, the related metadata will be generated and its location is <output_forder>. The details are as follows, here's system_cube is our <output_forder>:

Step3、创建 Hive 表

Step3、Create hive tables

Run the following command to generate five hive tables in the above table:运行下列命令生成上面表格中的 5 张 Hive 表:

Code Block
languagebash
titlecreate hive tables
linenumberstrue
collapsetrue
hive -f <output_forder>/create_hive_tables_for_system_cubes.sql

默认情况下,这些表将会创建于 Hive 中名为 kylin 的 database 下,可以通过配置项 By default, these tables will be created in the database named kylin in hive, and the default value kylin can be modified through the configuration item kylin.metrics.prefix 对默认值 kylin 进行修改.

...

Step4、Restore metadata

然后我们需要通过下列命令上传 metadata 到 metastore:Then we need to restore system cube metadata to kylin metastore through the following command:

Code Block
languagebash
titlecreate system cube
linenumberstrue
collapsetrue
bin/metastore.sh restore <output_forder>

Step5、Reload metadata

最后在 kylin web ui 中 Reload metadata,可以看到一组 system cubes 出现在名为 KYLIN_SYSTEM 的 project 下。

Step6、定时构建system cube

创建 system cube 后,随着 metrics 信息被写入 hive,需要定时构建这些 cube,以便写入 hive 中的 metrics 信息可以在 kylin 中快速查询。

可以使用如下方法来实现定期构建 system cube:

Finally, Reload metadata in kylin Web UI , you can see a group of system cubes appear in the project named KYLIN_SYSTEM.

Step6、Build system cube regularly

After creating system cubes, as the metrics information is written into hive, these cubes need to be built regularly so that the metrics information written into hive can be quickly queried in kylin.

You can use the following methods to build the system cube on a regular basis:

1、Create a shell script by calling 1、创建一个 shell 脚本,通过调用 org.apache.kylin.tool.job.CubeBuildingCLI 来构建系统 Cube。例如:to build the system cube. For example:

Code Block
languagepowershell
titleshell
linenumberstrue
collapsetrue
#!/bin/bash

dir=$(dirname ${0})
export KYLIN_HOME=${dir}/../

CUBE=$1
INTERVAL=$2
DELAY=$3
CURRENT_TIME_IN_SECOND=`date +%s`
CURRENT_TIME=$((CURRENT_TIME_IN_SECOND * 1000))
END_TIME=$((CURRENT_TIME-DELAY))
END=$((END_TIME - END_TIME%INTERVAL))

ID="$END"
echo "building for ${CUBE}_${ID}" >> ${KYLIN_HOME}/logs/build_trace.log
sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube ${CUBE} --endTime ${END} > ${KYLIN_HOME}/logs/system_cube_${CUBE}_${END}.log 2>&1 &

2、定期运行这个 shell 脚本。可以通过添加一个如下的 cron job 来实现:2、Run the shell script on a regular basis. This can be achieved by adding a cron job as follows:

Code Block
languagepowershell
titlecron job
linenumberstrue
collapsetrue
0 */2 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_QA 3600000 1200000

20 */2 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_CUBE_QA 3600000 1200000

40 */4 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_RPC_QA 3600000 1200000

30 */4 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_JOB_QA 3600000 1200000

50 */12 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_JOB_EXCEPTION_QA 3600000 12000

Anchor
Automatic
Automatic
3.

...

2 Automatically create system cube

你可以使用 You can use ${KYLIN_HOME}/bin/binsystemsystem-cube.sh 的脚本来帮助你自动完成以上操作:to help you automatically complete the above operations:

  • Create System 创建System Cube:sh system-cube.sh setup

  • 构建System Build System Cube:sh bin/system-cube.sh build

  • 为System Cube 添加定时任务:Add scheduled build jobs to system cube:bin/system.sh cron