Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here's a quick overview based on one developer's experience To use yourkit for a standalone java program - say pig on local file system, the commandline to use is:
java -agentpath:BASEDIR/yjp-7.0.7/bin/linux-x86-32/libyjpagent.so=dir=/tmp/yourkit_snapnshot,tracing,disablealloc,disablej2ee -cp <location of pig.jar> org.apache.pig.Main <pigscript>
In the above command line /tmp/yourkit_snapshot is the output directory into which yourkit outputs a ".snapshot" file. You can specify any directory to which you have write permissions. Yourkit seems to create the final dir in the path specification if it does not exist. The "tracing" option means that yourkit will trace the method calls to provide profile information (this gives accurate invocation counts since it is achieved by tracing every method call and is not based on sampling - which has the side effect that it is slower).

Pig Profiling "disablealloc" option means memory allocations are not traced. "disablej2ee" means j2ee specific profiling is disabled.
Using yourkit on a pig script running on a*cluster* in sampling mode: java -Dmapred.task.profile.maps=0-0 -Dmapred.task.profile.reduces=0-0 -Dmapred.task.profile=true -Dmapred.task.profile.params=-agentpath:CLUSTER_BASEDIR/libyjpagent.so=dir=/grid/0/tmp/yourkit_snapnshot,sampling,disablealloc,disablej2ee -cp <pig.jar pathname>:<dir containing of hadoop-site.xml> org.apache.pig.Main <pig script>

Using yourkit on a pig script running on a*cluster* in tracing mode:

...

java -Dmapred.max.split.size=10000000 -Dmapred.task.timeout=60000000 -Dmapred.task.profile.maps=0-0 -Dmapred.taskstask.profile.reduces=0-0 -Dmapred.task.profile=true -cp pig.jar -Dmapred.task.profile.params=-agentpath:CLUSTER_BASEDIR/libyjpagent.so=dir=/grid/0/tmp/yourkit_snapnshot,filters=/dev/null,tracing,disablealloc,disablej2ee -cp <pig.jar pathname>:<dir containing of hadoop-site.xml> org.apache.pig.Main ..<pig script>

With the above cmd, 0th mapper and reducer tasks are profiled and on the cluster machines running those tasks, a yourkit snapshot file is created at /grid/0/tmp/yourkit_snapnshot. This should be copied to the machine with the yourkit gui and loaded using the GUI to look at the profile informaiton. In the above cmd, "sampling" is used - to use tracing instead replace sampling with tracing in the above command.

...