Status

Discussion threadhttp://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html
Vote thread
JIRA

Unable to render Jira issues macro, execution error.

Release<Flink Version>
Reason

See comment in Unable to render Jira issues macro, execution error.

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

According to the Document, the backpressure monitor only triggered on request and it is currently not available via metrics. This means that in the web UI we have no way to show all the backpressure state of all vertices at the same time. The users need to check every vertex to get its backpressure state.

Proposed Changes

In Flink 1.9.0 and above, the user can infer the backpressure reason based on outPoolUsage, floatingBuffersUsage, exclusiveBuffersUsage metrics.

Here is a table get from https://flink.apache.org/2019/07/23/flink-network-stack-2.html

FLINK-14472 implements a back-pressure monitor with non-blocking outputs, with the new back-pressure monitor, we can monitor the back-pressure metric directly.

Frontend Design

Display the possible backpressure status, outPoolUsage, floatingBuffersUsage, exclusiveBuffersUsage on the vertex graph and subtask level, thus users can find the back-pressure vertex quickly and infer the back-pressure reason based on the metrics.


REST API Design

  • expose the new mechanism implemented in FLINK-14472 as a "is back-pressured" metric.
  • shows the vertex that produces the backpressure source in the job.
  • expose network metrics in IOMetricsInfo.
  • subtask level
    • url: /jobs/:jobId/vertices/:vertexId
    • response: add network’s metrics such as out-pool-usage、input-exclusive-pool-usage、input-floating-pool-usage、isBackPressured  in subtasks’ metrics.
  • vertex level
    • url: /jobs/:jobid
    • response: add network’s metrics such as out-pool-usage-avg、input-exclusive-pool-usage-avg、input-floating-pool-usage-avg、isBackPressured  in vertices’ metrics.

Test Plan

We need to update existing and add UI tests to make sure the new REST API works as expected.