Status

Discussion thread	http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-165-Operator-s-Flame-Graphs-td49097.html
Vote thread
JIRA	Unable to render Jira issues macro, execution error.
Release	1.13

Motivation

It is desirable to provide better visibility into the distribution of CPU resources while executing user code. One of the most visually effective means to do that are Flame Graphs. They allow to easily answer question like:

Which methods are currently consuming CPU resources?
How consumption by one method compares to the others?
Which series of calls on the stack led to executing a particular method?

Flame Graphs are constructed by sampling stack traces a number of times. Every method call is presented by a bar, where the length of the bar is proportional to the number of times it is present in the samples.

Proposed Changes

A new REST handler (JobVertexFlameGraphHandler) is registered in the WebMonitorEndpoint. It responds on the /jobs/$job_id/vertices/$vertex_id/flamegraph URL. A call to this URL initiates sampling in parallel of all instances of the selected operator (i.e Tasks that belong to the same $vertex_id).

A caching layer based on the implementation previously used for the purposes of backpressure sampling (BackPressureStatsTrackerImpl ) is introduced (ThreadInfoOperatorTracker). The sampling process is correspondingly coordinated by ThreadInfoRequestCoordinator similar in functionality to the StackTraceSampleCoordinator .

One important distinction to the legacy backpressure stack traces sampling process is that coordinator does not run as part of the JobManagerSharedServices but is rather initialized in the WebMonitorEndpoint. Instead of having to perform calls to JobMaster → DefaultScheduler

in order to retrieve a “live” ExecutionGraph, proposed implementation can instead utilize an ArchivedExecutionGraph. It is already available in the web monitor endpoint and can be directly used for localizing operator’s Tasks and their corresponding TaskExecutors. ThreadInfoRequestCoordinator can therefore be initialized and executed as part of the WebMonitorEndpoint instead of adding non-core functionality to the JobManagerSharedServices.

Call flow is illustrated by the following sequence diagram (click to zoom):

A new method is added to the TaskExecutorGateway interface:

public interface TaskExecutorGateway extends RpcGateway, TaskExecutorOperatorEventGateway {
    /**
     * Request a thread info sample from the given task.
     *
     * @param taskExecutionAttemptId identifying the task to sample
     * @param requestParams parameters of the request
     * @param timeout of the request
     * @return Future of stack trace sample response
     */
    CompletableFuture<TaskThreadInfoResponse> requestThreadInfoSamples(
            ExecutionAttemptID taskExecutionAttemptId,
            ThreadInfoSamplesRequest requestParams,
            Time timeout);
}

Stack traces are collected and transferred as part of ThreadInfo objects, which contain additional information, such as ThreadState. This allows, in addition to the on-CPU Flame Graphs, to also implement off-CPU Flame Graphs.

Distinction is made as follows:

On-CPU: Thread.State in [RUNNABLE, NEW]
Off-CPU: Thread.State in [TIMED_WAITING, WAITING, BLOCKED]

A selector in the UI allows to switch between different types of Flame Graphs:

Mixed mode contains stack traces of threads in all possible states.

Selection is made via a type parameter in the request:

/jobs/$job_id/vertices/$vertex_id/flamegraph?type=on_cpu

/jobs/$job_id/vertices/$vertex_id/flamegraph?type=off_cpu

/jobs/$job_id/vertices/$vertex_id/flamegraph?type=full

Flame Graphs are accessible via a new component in the UI at the level of the selected operator:

Page tree

Status

Motivation

Proposed Changes

1 Comment

Jacky Lau

Page tree

FLIP-165: Operator's Flame Graphs

Status

Motivation

Proposed Changes

1 Comment

Jacky Lau