Yarn App Log
Tez will propagate most of the exceptions to client side. So please first check the client side log to look for any useful information about the errors. If the client side log does not convey much information, you can check for the yarn application logs.
Users can invoke command "yarn logs -applicationId {your_app_id}" to fetch the yarn app log to your local directory. This command is only available when the yarn log aggregation is enabled. For how to enable the yarn log aggregation, you can refer to http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/ . If the log aggregation is not enabled, you may have to find the logs on each node manager machines.
After you get the yarn app logs. You can first check the Tez AM log, Tez AM is launched in the first yarn app containers, so it is located in folder like container_{yarn_app_id}_000001. Under this folder you may find the following files.
- syslog This is the log before AM properly started itself.
- syslog_dag_{yarn_app_id}_{dag_id} This is the log for each dag
- syslog_dag_{yarn_app_id}_{dag_id}_post This is the log for each dag after dag is completed
So usually you only need to check the last dag log to find the errors.
Tez-UI
Start from 0.6, tez has its proprietary ui to track the information of the running dag and history of completed dags. Most of time, users can see all the errors of dags on the tez-ui. Here's details of how to set up tez-ui. http://tez.apache.org/tez-ui.html
Hive on Tez
First hive client ( hive cli/beeline will print some info if error happens on tez ), if it's not clear you may check the hive log file. Usually it is located in /tmp/{user}/hive.log. It may be also located in other places if you configure it in $HIVE_HOME/conf/hive-log4j.properties.
If the hive log still don't have enough information, you can refer the yarn log ( refer section Yarn App Log )
Pig on Tez
Cascading on Tez
TBD
Profiling in Tez
It is possible profile specific tasks in specific vertices in Tez. "tez.task-specific.launch.cmd-opts.list" and "tez.task-specific.launch.cmd-opts" options can be used for this. Examples are given below.
- tez.task-specific.launch.cmd-opts.list
- Specifies the tasks in different vertices where additional options have to be specified
- Examples:
- tez.task-specific.launch.cmd-opts.list=“M5[0]" - Specifies task 0 in vertex M5
- tez.task-specific.launch.cmd-opts.list=“Map10[5,20]" - Specifies task 5,20 in vertex Map 10
- tez.task-specific.launch.cmd-opts.list=“M5[]" - Specifies all tasks in M5
- tez.task-specific.launch.cmd-opts.list=“M5[1:3,10]" - Specifies 1,2,3,10 tasks in M5
- However, defining partial ranges like "M5[:3]" is not yet supported.
- tez.task-specific.launch.cmd-opts.list=“M5[0];V2[10]" - Specifies task 0 in vertex M5 and task 10 in vertex V2
- tez.task-specific.launch.cmd-opts.list=“M5[0]" - Specifies task 0 in vertex M5
- tez.task-specific.launch.cmd-opts
- Specifies the additional task specific JVM launch options that need to be added.
- __VERTEX_NAME__ and __TASK_INDEX__ can be specified in the option, which would be replaced at runtime.
- Examples:
- tez.task-specific.launch.cmd-opts="-agentpath:/opt/yourkit/bin/linux-x86-64/libyjpagent.so=disablej2ee,tracing,alloceach=1000,onexit=snapshot,tracing_settings_path=/tmp/walltime.txt, dir=/tmp/__VERTEX_NAME__/__TASK_INDEX__"