THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

Background

In order to facilitate users to troubleshoot problems, kylin will collect all logs to kylin.log in the running process, including the logs generated by spark in the build job and query job, which will be managed by kylin's log system.

In the previous kylin version, the logs of kylin's build engine and query engine will be collected in kylin.log. Since spark is used as the build engine in kylin4, the logs of Kylin 4' s build job are the logs of spark jobs, including spark driver log and spark executor log. The log in the query process is mainly the log of the query engine sparder in kylin4. 

Because there are many spark logs in the build job, when these logs are output to kylin.log together with other logs, the kylin.log file will take up a lot of space, and its content will be chaotic, which is not conducive to problem analysis.

In order to solve this problem, Kylin 4.0.0 refactored the log of the build job. After refactor, the build log in Kylin 4 will be separated from kylin.log and uploaded to hdfs. The log4j configuration files of this log include the following two: ${KYLIN_HOME}/conf/spark-driver-log4j.properties and ${KYLIN_HOME}/conf/spark-executor-log4j.properties.

Driver log

SparkDriverHdfsLogAppender

spark-driver-log4j.properties is used to configure the output path, appender, layout, etc. of spark driver log in the build job. By default, spark driver log of a step of a build job will be output to a file in hdfs. The file path is spliced by kylin.env.hdfs-working-dir, kylin.metadata.url, project name, step id, etc., where step id is spliced by job Id and two digits counting from 00, for example, a build job's first step's step id is jobId-00, the second step's step id is jobId-01, and the specific path of log file is: ${kylin.env.hdfs-working-dir}/{kylin.metadata.url}/${project_name}/spark_logs/driver/${step_id}/execute_output.json.timestamp.log.

View logs through kylin WebUI

When enabled SparkDriverHdfsAppender, users can download driver logs from kylin's WebUI, even the spark.submit.deployMode is cluster(means the driver is not located at the same node of Kylin Job Server).

By default, the Output will only show the contents of the first and last 100 lines of all logs of this step. if you need to view all logs, you can click "download the log file" at the top of the Output window to download all logs, and then the complete spark driver log file of this step will be downloaded locally by the browser.

 

ConsoleAppender

If the user does not want to upload the log of spark driver to hdfs during the build job, the configuration item in spark-driver-log4j.properties can be changed:

Modify spark-driver-log4j.properties
vim ${KYLIN_HOME}/conf/spark-driver-log4j.properties
log4j.rootLogger=INFO,logFile

After modifying the configuration, restart kylin, and then the spark driver log of one step of a job will be output to the local file: ${KYLIN_HOME}/logs/spark/${step_id}.log.


Executor log

SparkExecutorHdfsAppender

spark-executor-log4j.properties is used to configure the output path, appender, layout, etc. of spark executor log in the build job. Similar to spark driver log, spark executor log of one step of a build job will be output to a folder in hdfs. Each file in this folder corresponds to an executor log. The path is ${kylin.env.hdfs-working-dir}/{kylin.metadata.url}/${project_name}/spark_logs/executor/yyyy-mm-dd/${job_id}/${step_id}/executor-x.log





Troubleshooting

When the spark job submitted by kylin is submitted to the yarn cluster for execution,  the user who uploads the spark executor log to HDFS may be yarn. At this time, the user of yarn may not have write permission to the hdfs directory ${kylin.env.hdfs-working-dir}/${kylin.metadata.url}/${project _ name}/spark_logs, which leads to the failure of uploading spark executor log. At this time, when viewing the task log with "yarn logs -applicationId <Application ID>",  you will see the following error:

This error can be solved by the following command:

acl
hadoop fs -setfacl -R -m user:yarn:rwx ${kylin.env.hdfs-working-dir}/{kylin.metadata.url}/${project_name}/spark_logs
  • No labels