KIP-4 Refator system cube for Kylin4

Q1. What are you trying to do? Articulate your objectives using absolutely no jargon.

System cube is a set of cubes created by Kylin for better self-monitoring. It is supported from kylin-2.3.0. For more information about system cube, please refer to http://kylin.apache.org/docs/tutorial/setup_systemcube.html.

In kylin3.x and kylin2.x, the cuboid data is stored in HBase, so the query metrics collected by system cube are basically related to HBase RPC; while kylin4 implements a new build and query engine, HBase storage has been replaced by the new parquet storage, and the original metrics no longer exist in kylin4. In order to make the system cube work normally in kylin4 and help users monitor the build and query, we need to redefine the new system cube query metrics, and the structure of the three query related hive tables that the corresponding system cube depends on will also change.

Q2. What problem is this proposal NOT designed to solve?

The dimension and measure of the new system cube will change greatly, and it will not be compatible with the system cube before kylin4.

Q3. How is it done today, and what are the limits of current practice?

Most of query metrics collected by system cube in kylin4 come from spark, and the three query related hive tables have evolved into three tables corresponding to query execution level, spark job level, and spark stage level. Because the dimensions and metrics of the new system cube will change greatly, when users upgrade from the previous version of kylin4 to kylin4, the previous system cube table will not be available.

Q4. What is new in your approach and why do you think it will be successful?

Add new metrics and delete unavailable metrics. Apache spark provides some listeners for tracking job and task metrics. Kylin4 will implement a listener to collect the metrics in the query process and write them to the hive table, and then you can use it like other tables.

Q5. Who cares? If you are successful, what difference will it make?

Users who want to be able to self monitor kylin will care about this. If the user uses the system cube before kylin4, then after the system cube is refactored in kylin4, when the user upgrades from kylin3 or kylin2 to kylin4, the previous system cube will not be available.

The structure of the three modified query metrics related hive tables can be viewed in the attachment

Q6. What are the risks?

There are no other risks other than the compatibility issues mentioned before.

Q7. How long will it take?

About two weeks.

Q8. How it works?

Provide scripts for users to create hive tables and cube metadata that system cube depends on
Based on the listener provided by spark, a listener is implemented in kylin to collect the metrics in the query process and write them to hive
The collected metrics can be queried by executing SQL in kylin

Reference

http://kylin.apache.org/docs/tutorial/setup_systemcube.html

https://kb.databricks.com/metrics/explore-spark-metrics.html

Space shortcuts

Page tree