THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

我们有 5 张 Hive 表记录 System Cube 相关的数据,3 张表记录了查询相关的指标数据,2 张表记录了任务相关的指标数据。当开启 Cube Planner 后,Kylin 中会出现项目 KYLIN_SYSTEM,里面包含 5 个 Cube,即每一张 Hive 表对应 1 个 Cube。

Hive 表描述

Table NameTypeDescription
My Question
hive_metrics_query_qaQueryquery metrics at the highest level每条查询生成一条数据
hive_metrics_query_cube_qaQuery

在 Cube 级别收集查指标。最重要的是 cuboids 相关的,其为 Cube planner 提供服务

每条查询生成一条数据确定不是查询击中 cube 的才生成一条数据?如果是一条下压查询,是不是记录在此?
hive_metrics_query_rpc_qaQuery

在最低级别收集查指标。对于一个查询,相关的 aggregation 和 filter 能够下推到每一个 rpc 目标服务器。Rpc 目标服务器的健壮性是更好查询性能的基础。

每条查询生成至少一条数据RPC 怎么理解
hive_metrics_job_qaJob收集成功 job 的指标每个成功的任务生成一条数据
hive_metrics_job_exception_qaJob收集失败 job 的指标每个失败的任务生成一条数据

Hive 表列信息与 Kylin Cube 信息的关系

hive_metrics_query_qa

ColumnTypeDescriptionSample(逗号分隔不同的样例)D/M度量函数My question
query_hash_code     bigintquery unique id7708685990456150000MCOUNT_DISTINCT

每一条查询一个hash_code吗?完全一样的查询,查2次,是一个hash_code,还是2个呀?

host                stringthe host of server for query enginecdh-client:10.1.3.91D 
kuser               stringuser nameADMIND 
project             stringproject nameLEARN_KYLIND 
realization         stringCube or Hybrid of Cubes (in Kylin, there are two OLAP realizations)kylin_sales_cube_SIMPLED 

确定是cube name吗?其他表有cube_name字段,这里为什么和其他表不一样?

realization_type    int
2D 我不明白
query_type          stringCACHE,OLAP,LOOKUP_TABLE,HIVE (users can query on different data sources)OLAP, CACHED 
exception           stringIt's for classifying different exception types (when doing query, exceptions may happen)NULL, java.lang.NumberFormatExceptionD 

所有的异常都有一个分类吗?

query_time_cost     bigintthe time cost for the whole query1392MMIN/SUM/MAX/PERCENTILE_APPROX
calcite_count_returnbigintthe row count of the result Calcite returns3MSUM/MAX我不明白
storage_count_returnbigintthe row count of the input to Calcite3MSUM/MAX我不明白
calcite_count_aggregate_filterbigintthe row count of Calcite aggregates and filters0MSUM/MAX我不明白,主要是不明白calcite的执行原理
ktimestamp          bigintquery begin time (timestamp)1600938970920  
kyear_begin_date    stringquery begin time (year)2020/1/1D 
kmonth_begin_date   stringquery begin time (month)2020/9/1D 
kweek_begin_date    stringquery begin time (week, begin with sumday)2020/9/20D 
kday_time           stringquery begin time (time)17:16:10D 
ktime_hour          intquery begin time (hour)17D 
ktime_minute        intquery begin time (minute)16D 
ktime_second        intquery begin time (second)10  
kday_date           stringquery begin time (day)2020/9/24Hive 表分区列 

hive_metrics_query_cube_qa

ColumnTypeDescriptionSampleD/M度量函数My question
host                stringthe host of server for query enginecdh-client:10.1.3.91  
project             stringproject namePEARVIDEOAPP  
cube_name           stringcube nameUserActionPhaseOneCubeD 
segment_name        stringsegment name20201011000000_20201012000000D 
cuboid_source       bigintsource cuboid parsed based on query and Cube design12582912D 谁是 source cuboid
cuboid_target       biginttarget cuboid already precalculated and served for source cuboid13041664D 谁是 target cuboid
if_match            booleanwhether source cuboid and target cuboid are equalFALSED 
filter_mask         bigint 4194304D 
if_success          booleanwhether a query on this Cube is successful or notTRUED 我不明白
weight_per_hit      double 1MSUM如果是一条下压查询,是不是记录在此?
storage_call_count  bigintthe number of rpc calls for a query hit on this Cube1MSUM/MAX
storage_call_time_sumbigintsum of time cost for the rpc calls of a query268MSUM/MAX
storage_call_time_maxbigintmax of time cost among the rpc calls of a quer268MSUM/MAX
storage_count_skip  bigintthe sum of row count skipped for the related rpc calls0MSUM/MAX
storage_count_scan  bigintthe sum of row count scanned for the related rpc calls929MSUM/MAX
storage_count_returnbigintthe sum of row count returned for the related rpc calls45MSUM/MAX
storage_count_aggregate_filterbigintthe sum of row count aggregated and filtered for the related rpc calls,= STORAGE_COUNT_SCAN - STORAGE_COUNT_RETURN884MSUM/MAX
storage_count_aggregatebigintthe sum of row count aggregated for the related rpc calls36MSUM/MAX
ktimestamp          bigintquery begin time (timestamp)1603462676906  
kyear_begin_date    stringquery begin time (year)2020/1/1D 
kmonth_begin_date   stringquery begin time (month)2020/10/1D 
kweek_begin_date    stringquery begin time (week, begin with sumday)2020/10/18D 
kday_time           stringquery begin time (time)22:17:56D 
ktime_hour          intquery begin time (hour)22D 
ktime_minute        intquery begin time (minute)17D 
ktime_second        intquery begin time (second)56  
kday_date           stringquery begin time (day)2020/10/23Hive 表分区列 

hive_metrics_query_rpc_qa

ColumnTypeDescriptionSampleD/M度量函数My question
host                stringthe host of server for query enginecdh-client:10.1.3.91D 
project             stringproject nameLEARN_KYLIND 
realization         stringcube namekylin_sales_cube_SIMPLED 
rpc_server          stringthe rpc related target servercdh-worker-2D 
exception           stringthe exception of a rpc call. If no exception, "NULL" is usedNULLD 
call_time           bigintthe time cost of a rpc all60MSUM/MAX/PERCENTILE_APPROX
count_return        bigintthe row count actually return3MSUM/MAX
count_scan          bigintthe row count actually scanned3MSUM/MAX
count_skip          bigintbased on fuzzy filters or else,a few rows will be skiped. This indicates the skipped row count0MSUM/MAX
count_aggregate_filterbigintthe row count actually aggregated and filtered,= COUNT_SCAN - COUNT_RETURN0MSUM/MAX
count_aggregate     bigintthe row count actually aggregated0MSUM/MAX
ktimestamp          bigintquery begin time (timestamp)1600938970918  
kyear_begin_date    stringquery begin time (year)2020/1/1D 
kmonth_begin_date   stringquery begin time (month)2020/9/1D 
kweek_begin_date    stringquery begin time (week, begin with sumday)2020/9/20D 
kday_time           stringquery begin time (time)17:16:10D 
ktime_hour          intquery begin time (hour)17D 
ktime_minute        intquery begin time (minute)16D 
ktime_second        intquery begin time (second)10  
kday_date           stringquery begin time (day)2020/9/24Hive 表分区列 

hive_metrics_job_qa

ColumnTypeDescriptionSampleD/M度量函数My question
job_id              stringjob id51b40173-1f6c-7e55-e0ca-fbc84d242ac0  
host                stringthe host of server for job enginecdh-client:10.1.3.91  
kuser               stringuser name ADMIND 
project             stringproject nameLEARN_KYLIND 
cube_name           stringcube namekylin_sales_cube_poiD 
job_type            stringjob type: build, merge, optimizeBUILDD 只有这三种吗
cubing_type         stringin kylin,there are two cubing algorithms,Layered & Fast(InMemory)NULLD 我不明白
duration            bigintthe duration from a job start to finish945001MSUM/MAX/MIN/PERCENTILE_APPROX
table_size          bigintthe size of data source in bytes227964845MSUM/MAX/MIN
cube_size           bigintthe size of created Cube segment in bytes35693596MSUM/MAX/MIN
per_bytes_time_cost doubleDURATION / TABLE_SIZE0.00414538MSUM/MAX/MIN
wait_resource_time  biginta job may includes serveral MR(map reduce) jobs. Those MR jobs may wait because of lack of Hadoop resources.158146MSUM/MAX/MIN
step_duration_distinct_columnsbigint
138586MSUM/MAX 我知道是不同步骤的时间,但我不知道这个步骤怎么划分的,实际的 job 多于 4 步
step_duration_dictionarybigint 5311MSUM/MAX 
step_duration_inmem_cubingbigint 89MSUM/MAX 
step_duration_hfile_convertbigint 75382MSUM/MAX 
ktimestamp          bigintquery begin time (timestamp)1600938458385  
kyear_begin_date    stringquery begin time (year)2020/1/1D 
kmonth_begin_date   stringquery begin time (month)2020/9/1D 
kweek_begin_date    stringquery begin time (week, begin with sumday)2020/9/20D 
kday_time           stringquery begin time (time)17:07:38D 
ktime_hour          intquery begin time (hour)17D 
ktime_minute        intquery begin time (minute)7D 
ktime_second        intquery begin time (second)38  
kday_date           stringquery begin time (day)2020/9/24Hive 表分区列 

hive_metrics_job_exception_qa

ColumnTypeDescriptionSampleD/M度量函数My question
job_id              stringjob ida333a36d-8e33-811f-9326-f04579cc2464  
host                stringthe host of server for job enginecdh-client:10.1.3.91  
kuser               stringuser nameADMIND 
project             stringproject nameLEARN_KYLIND 
cube_name           stringcube namekylin_sales_cubeD 
job_type            stringjob type: build, merge, optimizeBUILDD 
cubing_type         stringin kylin,there are two cubing algorithms,Layered & Fast(InMemory)LAYERD 
exception           stringwhen running a job,exceptions may happen. It's for classifying different exception typesorg.apache.kylin.job.exception.ExecuteExceptionD 
ktimestamp          bigintquery begin time (timestamp)1600936844611  
kyear_begin_date    stringquery begin time (year)2020/1/1D 
kmonth_begin_date   stringquery begin time (month)2020/9/1D 
kweek_begin_date    stringquery begin time (week, begin with sumday)2020/9/20D 
kday_time           stringquery begin time (time)16:40:44D 
ktime_hour          intquery begin time (hour)16D 
ktime_minute        intquery begin time (minute)40D 
ktime_second        intquery begin time (second)44D 
kday_date           stringquery begin time (day)2020/9/24Hive 表分区列 
  • No labels