Status
Motivation
It is desirable to provide better visibility into the distribution of CPU resources while executing user code. One of the most visually effective means to do that are Flame Graphs. They allow to easily answer question like:
Which methods are currently consuming CPU resources?
How consumption by one method compares to the others?
Which series of calls on the stack led to executing a particular method?
Flame Graphs are constructed by sampling stack traces a number of times. Every method call is presented by a bar, where the length of the bar is proportional to the number of times it is present in the samples.
Proposed Changes
A new REST handler (JobVertexFlameGraphHandler
) is registered in the WebMonitorEndpoint
. It responds on the /jobs/$job_id/vertices/$vertex_id/flamegraph
URL. A call to this URL initiates sampling in parallel of all instances of the selected operator (i.e Tasks that belong to the same $vertex_id
).
A caching layer based on the implementation previously used for the purposes of backpressure sampling (BackPressureStatsTrackerImpl ) is introduced (ThreadInfoOperatorTracker
). The sampling process is correspondingly coordinated by ThreadInfoRequestCoordinator
similar in functionality to the StackTraceSampleCoordinator .
One important distinction to the legacy backpressure stack traces sampling process is that coordinator does not run as part of the JobManagerSharedServices
but is rather initialized in the WebMonitorEndpoint
. Instead of having to perform calls to JobMaster
→ DefaultScheduler
in order to retrieve a “live” ExecutionGraph
, proposed implementation can instead utilize an ArchivedExecutionGraph
. It is already available in the web monitor endpoint and can be directly used for localizing operator’s Tasks
and their corresponding TaskExecutors
. ThreadInfoRequestCoordinator
can therefore be initialized and executed as part of the WebMonitorEndpoint
instead of adding non-core functionality to the JobManagerSharedServices
.
Call flow is illustrated by the following sequence diagram (click to zoom):
A new method is added to the TaskExecutorGateway
interface:
public interface TaskExecutorGateway extends RpcGateway, TaskExecutorOperatorEventGateway { /** * Request a thread info sample from the given task. * * @param taskExecutionAttemptId identifying the task to sample * @param requestParams parameters of the request * @param timeout of the request * @return Future of stack trace sample response */ CompletableFuture<TaskThreadInfoResponse> requestThreadInfoSamples( ExecutionAttemptID taskExecutionAttemptId, ThreadInfoSamplesRequest requestParams, Time timeout); }
Stack traces are collected and transferred as part of ThreadInfo objects, which contain additional information, such as ThreadState. This allows, in addition to the on-CPU Flame Graphs, to also implement off-CPU Flame Graphs.
Distinction is made as follows:
On-CPU:
Thread.State
in[RUNNABLE, NEW]
Off-CPU:
Thread.State
in[TIMED_WAITING, WAITING, BLOCKED]
A selector in the UI allows to switch between different types of Flame Graphs:
Mixed mode contains stack traces of threads in all possible states.
Selection is made via a type parameter in the request:
/jobs/$job_id/vertices/$vertex_id/flamegraph?type=on_cpu
/jobs/$job_id/vertices/$vertex_id/flamegraph?type=off_cpu
/jobs/$job_id/vertices/$vertex_id/flamegraph?type=full
Flame Graphs are accessible via a new component in the UI at the level of the selected operator:
1 Comment
Jacky Lau
hi Alexander Fedulov i found the flame graph webui is hang when the job is parallesim is big such as 500+. i support it using async profiler by add script ability in task manager. and that is taskmanger level instead of Jobvertex level.