Versions Compared

Old Version 1

changes.mady.by.user kaiqi xue

Saved on Oct 27, 2020

compared with

New Version 2

changes.mady.by.user kaiqi xue

Saved on Nov 02, 2020

Key

This line was added.
This line was removed.
Formatting was changed.

我们有 5 张 Hive 表记录 System Cube 相关的数据，3 张表记录了查询相关的指标数据，2 张表记录了任务相关的指标数据。当开启 Cube Planner 后，Kylin 中会出现项目 KYLIN_SYSTEM，里面包含 5 个 Cube，即每一张 Hive 表对应 1 个 Cube。

Hive 中有 3 张记录查询信息的表，为了理解这些表的含义，您可以阅读以下信息。
在 Kylin 中，查询存在以下逻辑：
1, 一条 SQL 可能会击中多个 Cube。如当单条 SQL 中嵌套子查询时，可能击中多个 Cube
Image Added
2, 查询击中的 Cube 中可能包含多个 Segment，因此需要扫描多个 Segment 中的数据
Image Added
3, 单个 Segment 中的数据可能存储在多个机器中，因此查询时需要扫描多个机器中的数据，每一台机器称为一个 RPC 目标服务器。如下图所示，展开 Cube 信息，在 Storage 标签下查看 Region Count，可以看出一个 Segment 对应多少个 RPC 目标服务器。Image Added

Hive 表描述

Table Name	Type	Description

My Question

备注
hive_metrics_query_qa	Query	query metrics at the highest level

每条查询生成一条数据

查询每击中一个 Cube 生成 1 条数据。
hive_metrics_query_cube_qa	Query	在 Cube 级别收集查指标。最重要的是 cuboids 相关的，其为 Cube planner 提供服务

每条查询生成一条数据确定不是查询击中 cube 的才生成一条数据？如果是一条下压查询，是不是记录在此？

查询击中 Cube 时，每扫描一个 Segment 生成一条数据。
hive_metrics_query_rpc_qa	Query	在最低级别收集查指标。对于一个查询，相关的 aggregation 和 filter 能够下推到每一个 rpc 目标服务器。Rpc 目标服务器的健壮性是更好查询性能的基础。

每条查询生成至少一条数据RPC 怎么理解

查询扫描 Segment 时，Segment 每包含一个 RPC 目标服务器生成 1 条数据。

如某个 Segment 包含 3 个 RPC 目标服务器，当查询需要扫描该 Segment 时，生成 3 条数据。

hive_metrics_job_qa

Job

收集成功 job 的指标

每个成功的任务生成一条数据

hive_metrics_job_exception_qa

Job

收集失败 job 的指标

每个失败的任务生成一条数据

Hive 表列信息与 Kylin Cube 信息的关系

hive_metrics_query_qa

Column	Type	Description	Sample(逗号分隔不同的样例)	D/M	度量函数

My question

备注
query_hash_code	bigint	query unique id	7708685990456150000	M	COUNT_DISTINCT

每一条查询一个hash_code吗？完全一样的查询，查2次，是一个hash_code，还是2个呀？

每个 SQL 对应一个唯一的 query_hash_code，当再次查询同样的 SQL 时，不会生成新的 query_hash_code
host	string	the host of server for query engine	cdh-client:10.1.3.91	D
kuser	string	user name	ADMIN	D
project	string	project name	LEARN_KYLIN	D
realization	string

Cube or Hybrid of Cubes (in Kylin, there are two OLAP realizations)

cube name

kylin_sales_cube_SIMPLE

D

确定是cube name吗？其他表有cube_name字段，这里为什么和其他表不一样？


realization_type	int		2	D

我不明白


query_type	string	CACHE，OLAP，LOOKUP_TABLE，HIVE (users can query on different data sources)	OLAP, CACHE	D
exception	string	It's for classifying different exception types (when doing query, exceptions may happen)	NULL, java.lang.NumberFormatException	D

所有的异常都有一个分类吗？


query_time_cost	bigint	the time cost for the whole query	1392	M	MIN/SUM/MAX/PERCENTILE_APPROX
calcite_count_return	bigint	the row count of the result Calcite returns	3	M	SUM/MAX

我不明白

Calcite 返回给 Kylin 的数据行数，如 n1
storage_count_return	bigint	the row count of the input to Calcite	3	M	SUM/MAX

我不明白

底层数据返回给 Calcite 的数据行数，如 n2
calcite_count_aggregate_filter	bigint	the row count of Calcite aggregates and filters	0	M	SUM/MAX

我不明白，主要是不明白calcite的执行原理

在 Calcite 中，被过滤或被聚合的数据行数，n2-n1
ktimestamp	bigint	query begin time (timestamp)	1600938970920
kyear_begin_date	string	query begin time (year)	2020/1/1	D
kmonth_begin_date	string	query begin time (month)	2020/9/1	D
kweek_begin_date	string	query begin time (week, begin with sumday)	2020/9/20	D
kday_time	string	query begin time (time)	17:16:10	D
ktime_hour	int	query begin time (hour)	17	D
ktime_minute	int	query begin time (minute)	16	D
ktime_second	int	query begin time (second)	10
kday_date	string	query begin time (day)	2020/9/24	Hive 表分区列

hive_metrics_query_cube_qa

Column	Type	Description	Sample	D/M
host	string	the host of server for query engine	cdh-client:10.1.3.91
project	string	project name	PEARVIDEOAPP
cube_name	string	cube name	UserActionPhaseOneCube	D
segment_name	string	segment name	20201011000000_20201012000000	D
cuboid_source	bigint	source cuboid parsed based on query and Cube design	12582912	D

谁是 source cuboid

查询最匹配的cuboid，可能是不存在
cuboid_target	bigint	target cuboid already precalculated and served for source cuboid	13041664	D

谁是 target cuboid

查询实际使用的cuboid，可以会通过后计算才能回答查询
if_match	boolean	whether source cuboid and target cuboid are equal	FALSE	D
filter_mask	bigint		4194304	D
if_success	boolean	whether a query on this Cube is successful or not	TRUE	D

我不明白


weight_per_hit	double

单条查询击中的Cube数的倒数

1

M

SUM

如果是一条下压查询，是不是记录在此？


storage_call_count	bigint	the number of rpc calls for a query hit on this Cube	1	M	SUM/MAX
storage_call_time_sum	bigint	sum of time cost for the rpc calls of a query	268	M	SUM/MAX
storage_call_time_max	bigint	max of time cost among the rpc calls of a quer	268	M	SUM/MAX
storage_count_skip	bigint	the sum of row count skipped for the related rpc calls	0	M	SUM/MAX
storage_count_scan	bigint	the sum of row count scanned for the related rpc calls	929	M	SUM/MAX
storage_count_return	bigint	the sum of row count returned for the related rpc calls	45	M	SUM/MAX
storage_count_aggregate_filter	bigint	the sum of row count aggregated and filtered for the related rpc calls，= STORAGE_COUNT_SCAN - STORAGE_COUNT_RETURN	884	M	SUM/MAX
storage_count_aggregate	bigint	the sum of row count aggregated for the related rpc calls	36	M	SUM/MAX
ktimestamp	bigint	query begin time (timestamp)	1603462676906
kyear_begin_date	string	query begin time (year)	2020/1/1	D
kmonth_begin_date	string	query begin time (month)	2020/10/1	D
kweek_begin_date	string	query begin time (week, begin with sumday)	2020/10/18	D
kday_time	string	query begin time (time)	22:17:56	D
ktime_hour	int	query begin time (hour)	22	D
ktime_minute	int	query begin time (minute)	17	D
ktime_second	int	query begin time (second)	56
kday_date	string	query begin time (day)	2020/10/23	Hive 表分区列

hive_metrics_query_rpc_qa

Column	Type	Description	Sample	D/M	度量函数
host	string	the host of server for query engine	cdh-client:10.1.3.91	D
project	string	project name	LEARN_KYLIN	D
realization	string	cube name	kylin_sales_cube_SIMPLE	D
rpc_server	string	the rpc related target server	cdh-worker-2	D
exception	string	the exception of a rpc call. If no exception, "NULL" is used	NULL	D
call_time	bigint	the time cost of a rpc all	60	M	SUM/MAX/PERCENTILE_APPROX
count_return	bigint	the row count actually return	3	M	SUM/MAX
count_scan	bigint	the row count actually scanned	3	M	SUM/MAX
count_skip	bigint	based on fuzzy filters or else，a few rows will be skiped. This indicates the skipped row count	0	M	SUM/MAX
count_aggregate_filter	bigint	the row count actually aggregated and filtered，= COUNT_SCAN - COUNT_RETURN	0	M	SUM/MAX
count_aggregate	bigint	the row count actually aggregated	0	M	SUM/MAX
ktimestamp	bigint	query begin time (timestamp)	1600938970918
kyear_begin_date	string	query begin time (year)	2020/1/1	D
kmonth_begin_date	string	query begin time (month)	2020/9/1	D
kweek_begin_date	string	query begin time (week, begin with sumday)	2020/9/20	D
kday_time	string	query begin time (time)	17:16:10	D
ktime_hour	int	query begin time (hour)	17	D
ktime_minute	int	query begin time (minute)	16	D
ktime_second	int	query begin time (second)	10
kday_date	string	query begin time (day)	2020/9/24	Hive 表分区列

hive_metrics_job_qa

Column	Type	Description	Sample	D/M	度量函数	My question
job_id	string	job id	51b40173-1f6c-7e55-e0ca-fbc84d242ac0
host	string	the host of server for job engine	cdh-client:10.1.3.91
kuser	string	user name	ADMIN	D
project	string	project name	LEARN_KYLIN	D
cube_name	string	cube name	kylin_sales_cube_poi	D
job_type	string	job type: build, merge, optimize	BUILD	D		只有这三种吗
cubing_type	string	in kylin，there are two cubing algorithms，Layered & Fast(InMemory)	NULL	D		我不明白
duration	bigint	the duration from a job start to finish	945001	M	SUM/MAX/MIN/PERCENTILE_APPROX
table_size	bigint	the size of data source in bytes	227964845	M	SUM/MAX/MIN
cube_size	bigint	the size of created Cube segment in bytes	35693596	M	SUM/MAX/MIN
per_bytes_time_cost	double	DURATION / TABLE_SIZE	0.00414538	M	SUM/MAX/MIN
wait_resource_time	bigint	a job may includes serveral MR(map reduce) jobs. Those MR jobs may wait because of lack of Hadoop resources.	158146	M	SUM/MAX/MIN
step_duration_distinct_columns	bigint		138586	M	SUM/MAX	我知道是不同步骤的时间，但我不知道这个步骤怎么划分的，实际的 job 多于 4 步
step_duration_dictionary	bigint		5311	M	SUM/MAX
step_duration_inmem_cubing	bigint		89	M	SUM/MAX
step_duration_hfile_convert	bigint		75382	M	SUM/MAX
ktimestamp	bigint	query begin time (timestamp)	1600938458385
kyear_begin_date	string	query begin time (year)	2020/1/1	D
kmonth_begin_date	string	query begin time (month)	2020/9/1	D
kweek_begin_date	string	query begin time (week, begin with sumday)	2020/9/20	D
kday_time	string	query begin time (time)	17:07:38	D
ktime_hour	int	query begin time (hour)	17	D
ktime_minute	int	query begin time (minute)	7	D
ktime_second	int	query begin time (second)	38
kday_date	string	query begin time (day)	2020/9/24	Hive 表分区列

hive_metrics_job_exception_qa

Column	Type	Description	Sample	D/M
job_id	string	job id	a333a36d-8e33-811f-9326-f04579cc2464
host	string	the host of server for job engine	cdh-client:10.1.3.91
kuser	string	user name	ADMIN	D
project	string	project name	LEARN_KYLIN	D
cube_name	string	cube name	kylin_sales_cube	D
job_type	string	job type: build, merge, optimize	BUILD	D
cubing_type	string	in kylin，there are two cubing algorithms，Layered & Fast(InMemory)	LAYER	D
exception	string	when running a job，exceptions may happen. It's for classifying different exception types	org.apache.kylin.job.exception.ExecuteException	D
ktimestamp	bigint	query begin time (timestamp)	1600936844611
kyear_begin_date	string	query begin time (year)	2020/1/1	D
kmonth_begin_date	string	query begin time (month)	2020/9/1	D
kweek_begin_date	string	query begin time (week, begin with sumday)	2020/9/20	D
kday_time	string	query begin time (time)	16:40:44	D
ktime_hour	int	query begin time (hour)	16	D
ktime_minute	int	query begin time (minute)	40	D
ktime_second	int	query begin time (second)	44	D
kday_date	string	query begin time (day)	2020/9/24	Hive 表分区列

Space shortcuts

Page tree

Versions Compared

Old Version 1

New Version 2

Key

Hive 表描述

Hive 表列信息与 Kylin Cube 信息的关系

hive_metrics_query_qa

hive_metrics_query_cube_qa

hive_metrics_query_rpc_qa

hive_metrics_job_qa

hive_metrics_job_exception_qa

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 1

New Version 2

Key

Hive 表描述

Hive 表列信息与 Kylin Cube 信息的关系

hive_metrics_query_qa

hive_metrics_query_cube_qa

hive_metrics_query_rpc_qa

hive_metrics_job_qa

hive_metrics_job_exception_qa