THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

Background

In the previous kylin release, the logs of kylin's build engine and query engine are collected or stored by resource manager(such as: yarn logs -applicationId xxx) or HBase Region Server instance. This may make it difficult to find root cause of failure job or slow query.

In order to solve this problem, Kylin 4.0.0 refactored the log of the build job. In Kylin 4.0.0, we are trying to collect and store these log under Kylin's working dir(HDFS or S3). The log4j configuration files of this log include the following two: ${KYLIN_HOME}/conf/spark-driver-log4j.properties and ${KYLIN_HOME}/conf/spark-executor-log4j.properties.


Driver log

SparkDriverHdfsLogAppender

spark-driver-log4j.properties is used to configure the output path, appender, layout, etc. of spark driver log in the build job. By default, spark driver log of a step of a build job will be output to a file in hdfs. The file path is spliced by kylin.env.hdfs-working-dir, kylin.metadata.url, project name, step id, etc., where step id is spliced by job Id and two digits counting from 00, for example, a build job's first step's step id is jobId-00, the second step's step id is jobId-01, and the specific path of log file is: ${kylin.env.hdfs-working-dir}/{kylin.metadata.url}/${project_name}/spark_logs/driver/${step_id}/execute_output.json.timestamp.log.

View logs through kylin WebUI

When enabled SparkDriverHdfsAppender, users can download driver logs from Kylin's Web UI, even the spark.submit.deployMode is cluster(means the driver is not located at the same node of Kylin Job Server).

By default, the Output will only show the contents of the first and last 100 lines of all logs of this step. if you need to view all logs, you can click "download the log file" at the top of the Output window to download all logs, and then the complete spark driver log file of this step will be downloaded locally by the browser.

 

ConsoleAppender

If the user does not want to upload the log of spark driver to hdfs during the build job, the configuration item in spark-driver-log4j.properties can be changed:

Modify spark-driver-log4j.properties
vim ${KYLIN_HOME}/conf/spark-driver-log4j.properties
log4j.rootLogger=INFO,logFile

After modifying the configuration, restart kylin, and then the spark driver log of one step of a job will be output to the local file: ${KYLIN_HOME}/logs/spark/${step_id}.log.


Executor log

SparkExecutorHdfsAppender

spark-executor-log4j.properties is used to configure the output path, appender, layout, etc. of spark executor log in the build job. Similar to spark driver log, spark executor log of one step of a build job will be output to a folder in hdfs. Each file in this folder corresponds to an executor log. The path is ${kylin.env.hdfs-working-dir}/{kylin.metadata.url}/${project_name}/spark_logs/executor/yyyy-mm-dd/${job_id}/${step_id}/executor-x.log




Troubleshooting

When the spark job submitted by kylin is submitted to the yarn cluster for execution,  the user who uploads the spark executor log to HDFS may be yarn. At this time, the user of yarn may not have write permission to the hdfs directory ${kylin.env.hdfs-working-dir}/${kylin.metadata.url}/${project _ name}/spark_logs, which leads to the failure of uploading spark executor log. At this time, when viewing the task log with "yarn logs -applicationId <Application ID>",  you will see the following error:

This error can be solved by the following command:

acl
hadoop fs -setfacl -R -m user:yarn:rwx ${kylin.env.hdfs-working-dir}/{kylin.metadata.url}/${project_name}/spark_logs
  • No labels