Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

官网文档请参考 http://kylin.apache.org/cn/docs/tutorial/setup_systemcube.html

Table of Contents

1. Background

...

Kylin 支持通过 System Cube 为了更好的监控 Kylin 系统。System Cube 记录了查询和 Job 相关的指标数据，能够有效的帮助系统运维。监控系统，System Cube 将服务于 Cube Planner 和 DashBoard。

2. The Hive Tables for System Cube

如果说 System Cube 服务于 Cube Planner 和 DashBoard，那么我们将会有 5 张 Hive 表服务于 System Cube，这 5 张 Hive 表分别记录了查询和任务的相关指标数据，如查询的响应时间，任务的成功/失败数量，这 5 张 Hive 表是 System Cube 的基础。创建这些 Hive Tables 的方式将在下文介绍，本节我们主要说明 Hive Tables 与 System Cube 的关系。

2.1 Hive Tables Overview Introduction

2.1.1 前置知识：查询的逻辑

您对 Kylin 的操作，如查询或构建，将在 Hive 表中被记录。因此了解 Kylin 中查询的逻辑能够帮助我们理解 Hive 表的含义。

在 Kylin 中，您输入的一条查询可能会击中多个 Cube。如单条 SQL 中嵌套子查询时，可能击中多个 Cube。

Image Removed

当 Cube 中包含多个 Segment 时，可能需要扫描多个 Segment 来回答查询。如 UserActionPhaseTwoCube 中包含 2 个 Segment

Image Removed

为了更好的并行计算，单个 Segment 中的数据可能存储在多个 RPC 目标服务器中，因此需要扫描多个 RPC 服务器来回答查询。如下图展开 Cube 信息，在 Storage 标签下查看 Region Count，可以看出一个 Segment 中的数据存储在多少个 RPC 目标服务器中。

Image Removed

2.1.2 Hive Tables 介绍

假设您已经通过脚本创建了 5 张 Hive 表，它们的简介如下：

...

收集查询相关的信息。

...

单条查询每击中一个 Cube 生成 1 条数据。

右图所示，该查询击中 2 个 Cube，将生成 2 条数据。

...

Image Removed

...

收集查询相关的信息

...

查询击中 Cube 时，每扫描一个 Segment 生成一条数据。

右图所示，当查询击中 UserActionPhaseTwoCube 且不包含时间筛选条件时，将生成 2 条数据。

...

Image Removed

...

查询扫描 Segment 时，Segment 每包含一个 RPC 目标服务器生成 1 条数据。如某个 Segment 包含 3 个 RPC 目标服务器，当查询需要扫描该 Segment 时，生成 3 条数据。

右图所示，查询扫描该 Segment 时将生成 3 条数据

...

Image Removed

记录了查询和任务相关的指标，能够有效的帮助系统运维和发现问题。当用户开启 System Cube 后，就可以在 Kylin 的 Web 界面查看项目 KYLIN_SYSTEM，在这个项目下有 5 个 Cube，它们分别从不同的维度记录系统的监控数据。用户可以通过 System Cube 进行更多场景的分析，以便更好的运维监控 Kylin。System Cube 也服务于 DashBoard 和 Cube Planner 第二阶段。

Image Added

2. The Hive Tables for System Cube

用户在 Kylin 中的每一个查询或构建操作，都会被记录在 Hive 表中，共有 5 张 Hive 表，它们分别对应了 5 个 System Cube 的事实表：

Hive Table Name	Description	System Cube Name
hive_metrics_query_qa	Collect query related information	hive_metrics_query_qa
hive_metrics_query_cube_qa	Collect query related information	hive_metrics_query_cube_qa	Related to Cube Planner
hive_metrics_query_rpc_qa	Collect query related information	hive_metrics_query_rpc_qa
hive_metrics_job_qa	Collect job related information	hive_metrics_job_qa
hive_metrics_job_exception_qa	Collect job related information	hive_metrics_job_exception_qa

以下列出与 Hive 表相关的 5 个配置项：

kylin.metrics.prefix：The system will automatically create the above 5 tables in database named 'kylin' by default. You can customize the database by modifying the configuration item kylin.metrics.prefix=<name>. <name> is also the prefix of the System Cube name;
kylin.metric.subject-suffix：You can customize the suffix of the hive tables, the default is 'qa', so the table name is 'hive_metrics_query_qa';
kylin.metrics.monitor-enabled：Whether to record metrics in Hive, control the above 5 tables, the default is false, which means not to record;
kylin.metrics.reporter-query-enabled：Whether to record metrics in Hive, control the above 3 tables about query, the default is false, which means not to record;
kylin.metrics.reporter-job-enabled：Whether to record metrics in Hive, control the above 2 tables about job, the default is false, which means not to record;

2.1 How to record query metrics into Hive

Cube 中可以有多个 Segments，每个 Segment 的数据可能存储在不同的 RPC 服务器中，当用户发送一条查询时，查询可能击中多个 Cube，扫描每个 Cube 下的多个 Segments，扫描每个 Segment 下面多个 RPC 服务器中存储的数据。那么对于发送的一条查询：

每击中一个 Cube，在表 hive_metrics_query_qa 中记录一行数据；
当查询击中 Cube，每扫描 Cube 下的一个 Segment，在表 hive_metrics_query_cube_qa 中记录一行数据；
当查询需要扫描 Cube 下的一个 Segment，每扫描一个 RPC 服务器下的数据，在表 hive_metrics_query_rpc_qa 中记录一行数据。（提示：展开 Cube 详情，在 Storage 标签下查看 Region Count，可以看出一个 Segment 中的数据存储在多少个 RPC 目标服务器中）

Image Added

2.2 How to record job metrics into Hive

对于表 hive_metrics_job_qa，每个成功的任务生成一条数据；
对于表 hive_metrics_job_exception_qa，每个成功的任务生成一条数据；

2.3 Some tips about recording metrics

相关指标插入 Hive 表中有一定的延迟，系统一般会“在一定时间后”或“将一个固定批量的数据”一次性插入到 Hive 表中。默认的“一定的时间”是 10 分钟，“一个固定批量的数据”是 10 条。如果希望快速验证，可以采用以下两种方法：

修改配置文件 $KYLIN

...

2.1.3 验证时的注意事项

如果您希望验证 Kylin 中的查询或构建任务是否会在 Hive Tables 中被记录，请注意 Hive Tables 中的数据生成一般是有延迟的，系统一般会“在一定时间后”或“将一个固定批量的数据”一次性插入到 Hive tables 中。如果您希望快速验证，您可以采用以下两种方法：

...

_HOME/tomcat/webapps/kylin/WEB-INF/

...

Image Removed

重启 Kylin 会立即记录所有需要被记录的数据（未验证）

...

classes/kylinMetrics.xml 中的配置项，“index=1” 的配置项表示批量（累计多少条数据必然会插入 Hive），“index=2” 的配置项表示时间间隔（累计多长时间必然会插入 Hive，单位分钟）；

Image Added

重启 Kylin 会立即记录所有需要被记录的数据；

接下来可以进入 Hive，查询相应的表以确定数据已插入 Hive，例如：

Code Block

language	sql

hive
user kylin;
select * from hive_metrics_query_cube_qa;

2.4 The Relationship Between Hive Tables And System Cube

开启 System Cube 后，Kylin 中会出现一个项目 KYLIN_SYSTEM，里面包含 5 个 Cube，每个 Cube 中包含一张事实表（无维度表），每个事实表对应一个 Hive Table。

...

Column: the name of Hive table column
Type: the type of this column
Description: the description of this column
Sample: Sample data in Hive
D/M: this column is set as Dimension or Measure in System Cube
度量函数: If it is set as Measure, the fuction function of the measure
备注：其他补充信息

2.

...

4.1 hive_metrics_query_qa

Column	Type	Description	Sample	Dimension or Measure in System Cube	度量函数	备注
query_hash_code	bigint	query unique id	7708685990456150000	M	COUNT_DISTINCT	每个 SQL 对应一个唯一的 query_hash_code，当再次查询同样的 SQL 时，不会生成新的 query_hash_code
host	string	the host of server for query engine	cdh-client:10.1.3.91	D
kuser	string	user name	ADMIN	D
project	string	project name	LEARN_KYLIN	D
realization	string	cube name	kylin_sales_cube_SIMPLE	D
realization_type	int	the storage type	2	D
query_type	string	CACHE，OLAP，LOOKUP_TABLE，HIVE (users can query on different data sources)	OLAP, CACHE	D
exception	string	It's for classifying different exception types (when doing query, exceptions may happen)	NULL, java.lang.NumberFormatException	D
query_time_cost	bigint	the time cost for the whole query	1392	M	MIN/SUM/MAX/PERCENTILE_APPROX
calcite_count_return	bigint	the row count of the result Calcite returns	3	M	SUM/MAX	Calcite 返回给 Kylin 的数据行数，如 n1
storage_count_return	bigint	the row count of the input to Calcite	3	M	SUM/MAX	底层数据返回给 Calcite 的数据行数，如 n2
calcite_count_aggregate_filter	bigint	the row count of Calcite aggregates and filters	0	M	SUM/MAX	在 Calcite 中，被过滤或被聚合的数据行数，n2-n1
ktimestamp	bigint	query begin time (timestamp)	1600938970920
kyear_begin_date	string	query begin time (year)	2020/1/1	D
kmonth_begin_date	string	query begin time (month)	2020/9/1	D
kweek_begin_date	string	query begin time (week, begin with sumdaySunday)	2020/9/20	D
kday_time	string	query begin time (time)	17:16:10	D
ktime_hour	int	query begin time (hour)	17	D
ktime_minute	int	query begin time (minute)	16	D
ktime_second	int	query begin time (second)	10
kday_date	string	query begin time (day)	2020/9/24	Hive 表分区列

2.

...

4.2 hive_metrics_query_cube_qa

Column	Type	Description	Sample	D/M	度量函数	My question
host	string	the host of server for query engine	cdh-client:10.1.3.91
project	string	project name	PEARVIDEOAPP
cube_name	string	cube name	UserActionPhaseOneCube	D
segment_name	string	segment name	20201011000000_20201012000000	D
cuboid_source	bigint	source cuboid parsed based on query and Cube design	12582912	D		查询最匹配的cuboid，可能是不存在与查询模式最匹配的 Cuboid，可能尚未构建
cuboid_target	bigint	target cuboid already precalculated and served for source cuboid	13041664	D		查询实际使用的cuboid，可以会通过后计算才能回答查询查询实际使用的Cuboid，可能需要后计算才能回答查询
if_match	boolean	whether source cuboid and target cuboid are equal	FALSE	D
filter_mask	bigint		4194304	D
if_success	boolean	whether a query on this Cube is successful or not	TRUE	D
weight_per_hit	double单条查询击中的Cube数的倒数	单条查询击中的 Cube 数的倒数	1	M	SUM
storage_call_count	bigint	the number of rpc calls for a query hit on this Cube	1	M	SUM/MAX
storage_call_time_sum	bigint	sum of time cost for the rpc calls of a query	268	M	SUM/MAX
storage_call_time_max	bigint	max of time cost among the rpc calls of a quer	268	M	SUM/MAX
storage_count_skip	bigint	the sum of row count skipped for the related rpc calls	0	M	SUM/MAX
storage_count_scan	bigint	the sum of row count scanned for the related rpc calls	929	M	SUM/MAX
storage_count_return	bigint	the sum of row count returned for the related rpc calls	45	M	SUM/MAX
storage_count_aggregate_filter	bigint	the sum of row count aggregated and filtered for the related rpc calls，= STORAGE_COUNT_SCAN - STORAGE_COUNT_RETURN	884	M	SUM/MAX
storage_count_aggregate	bigint	the sum of row count aggregated for the related rpc calls	36	M	SUM/MAX
ktimestamp	bigint	query begin time (timestamp)	1603462676906
kyear_begin_date	string	query begin time (year)	2020/1/1	D
kmonth_begin_date	string	query begin time (month)	2020/10/1	D
kweek_begin_date	string	query begin time (week, begin with sumday)	2020/10/18	D
kday_time	string	query begin time (time)	22:17:56	D
ktime_hour	int	query begin time (hour)	22	D
ktime_minute	int	query begin time (minute)	17	D
ktime_second	int	query begin time (second)	56
kday_date	string	query begin time (day)	2020/10/23	Hive 表分区列

2.

...

4.3 hive_metrics_query_rpc_qa

...

Column Type Description Sample D/M 度量函数
host                 string the host of server for query engine cdh-client:10.1.3.91 D 　
project              string project name LEARN_KYLIN D 　
realization          string cube name kylin_sales_cube_SIMPLE D 　
rpc_server           string the rpc related target server cdh-worker-2 D 　
exception            string the exception of a rpc call. If no exception, "NULL" is used NULL D 　
call_time            bigint the time cost of a rpc all 60 M SUM/MAX/PERCENTILE_APPROX
count_return         bigint the row count actually return 3 M SUM/MAX
count_scan           bigint the row count actually scanned 3 M SUM/MAX
count_skip           bigint based on fuzzy filters or else，a few rows will be skiped. This indicates the skipped row count 0 M SUM/MAX
count_aggregate_filter bigint the row count actually aggregated and filtered，= COUNT_SCAN - COUNT_RETURN 0 M SUM/MAX
count_aggregate      bigint the row count actually aggregated 0 M SUM/MAX
ktimestamp           bigint query begin time (timestamp) 1600938970918 　　
kyear_begin_date     string query begin time (year) 2020/1/1 D 　
kmonth_begin_date    string query begin time (month) 2020/9/1 D 　
kweek_begin_date     string query begin time (week, begin with sumday) 2020/9/20 D 　
kday_time            string query begin time (time) 17:16:10 D 　
ktime_hour           int query begin time (hour) 17 D 　
ktime_minute         int query begin time (minute) 16 D 　
ktime_second         int query begin time (second) 10 　　
kday_date            string query begin time (day) 2020/9/24 Hive 表分区列　

2.2.4 hive_metrics_job_qa

...

只有这三种吗
Column Type Description Sample D/M 度量函数 My question
job_id               string job id 51b40173-1f6c-7e55-e0ca-fbc84d242ac0 　　
host                 string the host of server for job engine cdh-client:10.1.3.91 　　
kuser                string user name ADMIN D 　
project              string project name LEARN_KYLIN D 　
cube_name            string cube name kylin_sales_cube_poi D 　
job_type             string job type: build, merge, optimize BUILD D 　
cubing_type          string in kylin，there are two cubing algorithms，Layered & Fast(InMemory) NULL D 　我不明白
duration             bigint the duration from a job start to finish 945001 M SUM/MAX/MIN/PERCENTILE_APPROX
table_size           bigint the size of data source in bytes 227964845 M SUM/MAX/MIN
cube_size            bigint the size of created Cube segment in bytes 35693596 M SUM/MAX/MIN
per_bytes_time_cost double DURATION / TABLE_SIZE 0.00414538 M SUM/MAX/MIN
wait_resource_time   bigint a job may includes serveral MR(map reduce) jobs. Those MR jobs may wait because of lack of Hadoop resources. 158146 M SUM/MAX/MIN
step_duration_distinct_columns bigint
138586 M SUM/MAX 我知道是不同步骤的时间，但我不知道这个步骤怎么划分的，实际的 job 多于 4 步
step_duration_dictionary bigint 　 5311 M SUM/MAX
step_duration_inmem_cubing bigint 　 89 M SUM/MAX
step_duration_hfile_convert bigint 　 75382 M SUM/MAX
ktimestamp           bigint query begin time (timestamp) 1600938458385 　　
kyear_begin_date     string query begin time (year) 2020/1/1 D 　
kmonth_begin_date    string query begin time (month) 2020/9/1 D 　
kweek_begin_date     string query begin time (week, begin with sumday) 2020/9/20 D 　
kday_time            string query begin time (time) 17:07:38 D 　
ktime_hour           int query begin time (hour) 17 D 　
ktime_minute         int query begin time (minute) 7 D 　
ktime_second         int query begin time (second) 38 　　
kday_date            string query begin time (day) 2020/9/24 Hive 表分区列　

2.2.5 hive_metrics_job_exception_qa

...

Column Type Description Sample D/M 度量函数
job_id               string job id a333a36d-8e33-811f-9326-f04579cc2464 　　
host                 string the host of server for job engine cdh-client:10.1.3.91 　　
kuser                string user name ADMIN D 　
project              string project name LEARN_KYLIN D 　
cube_name            string cube name kylin_sales_cube D 　
job_type             string job type: build, merge, optimize BUILD D 　
cubing_type          string in kylin，there are two cubing algorithms，Layered & Fast(InMemory) LAYER D 　
exception            string when running a job，exceptions may happen. It's for classifying different exception types org.apache.kylin.job.exception.ExecuteException D 　
ktimestamp           bigint query begin time (timestamp) 1600936844611 　　
kyear_begin_date     string query begin time (year) 2020/1/1 D 　
kmonth_begin_date    string query begin time (month) 2020/9/1 D 　
kweek_begin_date     string query begin time (week, begin with sumday) 2020/9/20 D 　
kday_time            string query begin time (time) 16:40:44 D 　
ktime_hour           int query begin time (hour) 16 D 　
ktime_minute         int query begin time (minute) 40 D 　
ktime_second         int query begin time (second) 44 D 　
kday_date            string query begin time (day) 2020/9/24 Hive 表分区列　

3. How To Enable System Cube

...

System

...

Cube

...

请查阅官网文档的开启方法请参考官方文档 http://kylin.apache.org/cn/docs/tutorial/setup_systemcube.html or to be update

一般分为 3 个部分：

创建 Hive Tables 和 System Cube
构建 System Cube
为 System Cube 添加定时构建任务

How To Use System Cube

，请注意 Kylin v2.x 和 Kylin v3.x 版本开启 System Cube 的方法不同，请参考正确的文档版本。

下面是一个开启 System Cube 的演示视频：

View file

name	system cube.mp4
height	250

System Cube 记录了系统的指标数据，它为 DashBoard 和 Cube Planner 提供了数据基础。

Space shortcuts

Page tree

Versions Compared

Old Version 3

New Version 4

Key

1. Background

Kylin 支持通过 System Cube 为了更好的监控 Kylin 系统。System Cube 记录了查询和 Job 相关的指标数据，能够有效的帮助系统运维。监控系统，System Cube 将服务于 Cube Planner 和 DashBoard。

2. The Hive Tables for System Cube