我们有 5 张 Hive 表记录 System Cube 相关的数据，3 张表记录了查询相关的指标数据，2 张表记录了任务相关的指标数据。当开启 Cube Planner 后，Kylin 中会出现项目 KYLIN_SYSTEM，里面包含 5 个 Cube，即每一张 Hive 表对应 1 个 Cube。

Hive 表描述

Table Name	Type	Description		My Question
hive_metrics_query_qa	Query	query metrics at the highest level	每条查询生成一条数据
hive_metrics_query_cube_qa	Query	在 Cube 级别收集查指标。最重要的是 cuboids 相关的，其为 Cube planner 提供服务	每条查询生成一条数据	确定不是查询击中 cube 的才生成一条数据？如果是一条下压查询，是不是记录在此？
hive_metrics_query_rpc_qa	Query	在最低级别收集查指标。对于一个查询，相关的 aggregation 和 filter 能够下推到每一个 rpc 目标服务器。Rpc 目标服务器的健壮性是更好查询性能的基础。	每条查询生成至少一条数据	RPC 怎么理解
hive_metrics_job_qa	Job	收集成功 job 的指标	每个成功的任务生成一条数据
hive_metrics_job_exception_qa	Job	收集失败 job 的指标	每个失败的任务生成一条数据

Hive 表列信息与 Kylin Cube 信息的关系

hive_metrics_query_qa

Column	Type	Description	Sample(逗号分隔不同的样例)	D/M	度量函数	My question
query_hash_code	bigint	query unique id	7708685990456150000	M	COUNT_DISTINCT	每一条查询一个hash_code吗？完全一样的查询，查2次，是一个hash_code，还是2个呀？
host	string	the host of server for query engine	cdh-client:10.1.3.91	D
kuser	string	user name	ADMIN	D
project	string	project name	LEARN_KYLIN	D
realization	string	Cube or Hybrid of Cubes (in Kylin, there are two OLAP realizations)	kylin_sales_cube_SIMPLE	D		确定是cube name吗？其他表有cube_name字段，这里为什么和其他表不一样？
realization_type	int		2	D		我不明白
query_type	string	CACHE，OLAP，LOOKUP_TABLE，HIVE (users can query on different data sources)	OLAP, CACHE	D
exception	string	It's for classifying different exception types (when doing query, exceptions may happen)	NULL, java.lang.NumberFormatException	D		所有的异常都有一个分类吗？
query_time_cost	bigint	the time cost for the whole query	1392	M	MIN/SUM/MAX/PERCENTILE_APPROX
calcite_count_return	bigint	the row count of the result Calcite returns	3	M	SUM/MAX	我不明白
storage_count_return	bigint	the row count of the input to Calcite	3	M	SUM/MAX	我不明白
calcite_count_aggregate_filter	bigint	the row count of Calcite aggregates and filters	0	M	SUM/MAX	我不明白，主要是不明白calcite的执行原理
ktimestamp	bigint	query begin time (timestamp)	1600938970920
kyear_begin_date	string	query begin time (year)	2020/1/1	D
kmonth_begin_date	string	query begin time (month)	2020/9/1	D
kweek_begin_date	string	query begin time (week, begin with sumday)	2020/9/20	D
kday_time	string	query begin time (time)	17:16:10	D
ktime_hour	int	query begin time (hour)	17	D
ktime_minute	int	query begin time (minute)	16	D
ktime_second	int	query begin time (second)	10
kday_date	string	query begin time (day)	2020/9/24	Hive 表分区列

hive_metrics_query_cube_qa

Column	Type	Description	Sample	D/M	度量函数	My question
host	string	the host of server for query engine	cdh-client:10.1.3.91
project	string	project name	PEARVIDEOAPP
cube_name	string	cube name	UserActionPhaseOneCube	D
segment_name	string	segment name	20201011000000_20201012000000	D
cuboid_source	bigint	source cuboid parsed based on query and Cube design	12582912	D		谁是 source cuboid
cuboid_target	bigint	target cuboid already precalculated and served for source cuboid	13041664	D		谁是 target cuboid
if_match	boolean	whether source cuboid and target cuboid are equal	FALSE	D
filter_mask	bigint		4194304	D
if_success	boolean	whether a query on this Cube is successful or not	TRUE	D		我不明白
weight_per_hit	double		1	M	SUM	如果是一条下压查询，是不是记录在此？
storage_call_count	bigint	the number of rpc calls for a query hit on this Cube	1	M	SUM/MAX
storage_call_time_sum	bigint	sum of time cost for the rpc calls of a query	268	M	SUM/MAX
storage_call_time_max	bigint	max of time cost among the rpc calls of a quer	268	M	SUM/MAX
storage_count_skip	bigint	the sum of row count skipped for the related rpc calls	0	M	SUM/MAX
storage_count_scan	bigint	the sum of row count scanned for the related rpc calls	929	M	SUM/MAX
storage_count_return	bigint	the sum of row count returned for the related rpc calls	45	M	SUM/MAX
storage_count_aggregate_filter	bigint	the sum of row count aggregated and filtered for the related rpc calls，= STORAGE_COUNT_SCAN - STORAGE_COUNT_RETURN	884	M	SUM/MAX
storage_count_aggregate	bigint	the sum of row count aggregated for the related rpc calls	36	M	SUM/MAX
ktimestamp	bigint	query begin time (timestamp)	1603462676906
kyear_begin_date	string	query begin time (year)	2020/1/1	D
kmonth_begin_date	string	query begin time (month)	2020/10/1	D
kweek_begin_date	string	query begin time (week, begin with sumday)	2020/10/18	D
kday_time	string	query begin time (time)	22:17:56	D
ktime_hour	int	query begin time (hour)	22	D
ktime_minute	int	query begin time (minute)	17	D
ktime_second	int	query begin time (second)	56
kday_date	string	query begin time (day)	2020/10/23	Hive 表分区列

hive_metrics_query_rpc_qa

Column	Type	Description	Sample	D/M	度量函数
host	string	the host of server for query engine	cdh-client:10.1.3.91	D
project	string	project name	LEARN_KYLIN	D
realization	string	cube name	kylin_sales_cube_SIMPLE	D
rpc_server	string	the rpc related target server	cdh-worker-2	D
exception	string	the exception of a rpc call. If no exception, "NULL" is used	NULL	D
call_time	bigint	the time cost of a rpc all	60	M	SUM/MAX/PERCENTILE_APPROX
count_return	bigint	the row count actually return	3	M	SUM/MAX
count_scan	bigint	the row count actually scanned	3	M	SUM/MAX
count_skip	bigint	based on fuzzy filters or else，a few rows will be skiped. This indicates the skipped row count	0	M	SUM/MAX
count_aggregate_filter	bigint	the row count actually aggregated and filtered，= COUNT_SCAN - COUNT_RETURN	0	M	SUM/MAX
count_aggregate	bigint	the row count actually aggregated	0	M	SUM/MAX
ktimestamp	bigint	query begin time (timestamp)	1600938970918
kyear_begin_date	string	query begin time (year)	2020/1/1	D
kmonth_begin_date	string	query begin time (month)	2020/9/1	D
kweek_begin_date	string	query begin time (week, begin with sumday)	2020/9/20	D
kday_time	string	query begin time (time)	17:16:10	D
ktime_hour	int	query begin time (hour)	17	D
ktime_minute	int	query begin time (minute)	16	D
ktime_second	int	query begin time (second)	10
kday_date	string	query begin time (day)	2020/9/24	Hive 表分区列

hive_metrics_job_qa

Column	Type	Description	Sample	D/M	度量函数	My question
job_id	string	job id	51b40173-1f6c-7e55-e0ca-fbc84d242ac0
host	string	the host of server for job engine	cdh-client:10.1.3.91
kuser	string	user name	ADMIN	D
project	string	project name	LEARN_KYLIN	D
cube_name	string	cube name	kylin_sales_cube_poi	D
job_type	string	job type: build, merge, optimize	BUILD	D		只有这三种吗
cubing_type	string	in kylin，there are two cubing algorithms，Layered & Fast(InMemory)	NULL	D		我不明白
duration	bigint	the duration from a job start to finish	945001	M	SUM/MAX/MIN/PERCENTILE_APPROX
table_size	bigint	the size of data source in bytes	227964845	M	SUM/MAX/MIN
cube_size	bigint	the size of created Cube segment in bytes	35693596	M	SUM/MAX/MIN
per_bytes_time_cost	double	DURATION / TABLE_SIZE	0.00414538	M	SUM/MAX/MIN
wait_resource_time	bigint	a job may includes serveral MR(map reduce) jobs. Those MR jobs may wait because of lack of Hadoop resources.	158146	M	SUM/MAX/MIN
step_duration_distinct_columns	bigint		138586	M	SUM/MAX	我知道是不同步骤的时间，但我不知道这个步骤怎么划分的，实际的 job 多于 4 步
step_duration_dictionary	bigint		5311	M	SUM/MAX
step_duration_inmem_cubing	bigint		89	M	SUM/MAX
step_duration_hfile_convert	bigint		75382	M	SUM/MAX
ktimestamp	bigint	query begin time (timestamp)	1600938458385
kyear_begin_date	string	query begin time (year)	2020/1/1	D
kmonth_begin_date	string	query begin time (month)	2020/9/1	D
kweek_begin_date	string	query begin time (week, begin with sumday)	2020/9/20	D
kday_time	string	query begin time (time)	17:07:38	D
ktime_hour	int	query begin time (hour)	17	D
ktime_minute	int	query begin time (minute)	7	D
ktime_second	int	query begin time (second)	38
kday_date	string	query begin time (day)	2020/9/24	Hive 表分区列

hive_metrics_job_exception_qa

Column	Type	Description	Sample	D/M
job_id	string	job id	a333a36d-8e33-811f-9326-f04579cc2464
host	string	the host of server for job engine	cdh-client:10.1.3.91
kuser	string	user name	ADMIN	D
project	string	project name	LEARN_KYLIN	D
cube_name	string	cube name	kylin_sales_cube	D
job_type	string	job type: build, merge, optimize	BUILD	D
cubing_type	string	in kylin，there are two cubing algorithms，Layered & Fast(InMemory)	LAYER	D
exception	string	when running a job，exceptions may happen. It's for classifying different exception types	org.apache.kylin.job.exception.ExecuteException	D
ktimestamp	bigint	query begin time (timestamp)	1600936844611
kyear_begin_date	string	query begin time (year)	2020/1/1	D
kmonth_begin_date	string	query begin time (month)	2020/9/1	D
kweek_begin_date	string	query begin time (week, begin with sumday)	2020/9/20	D
kday_time	string	query begin time (time)	16:40:44	D
ktime_hour	int	query begin time (hour)	16	D
ktime_minute	int	query begin time (minute)	40	D
ktime_second	int	query begin time (second)	44	D
kday_date	string	query begin time (day)	2020/9/24	Hive 表分区列

Space shortcuts

Page tree

Hive 表描述

Hive 表列信息与 Kylin Cube 信息的关系

hive_metrics_query_qa

hive_metrics_query_cube_qa

hive_metrics_query_rpc_qa

hive_metrics_job_qa

hive_metrics_job_exception_qa

Space shortcuts

Page tree

Hive Tables for System Cube

Hive 表描述

Hive 表列信息与 Kylin Cube 信息的关系

hive_metrics_query_qa

hive_metrics_query_cube_qa

hive_metrics_query_rpc_qa

hive_metrics_job_qa

hive_metrics_job_exception_qa