Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state[Under Discussion]

Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html

...

Page properties


Discussion thread
Vote thread
JIRA

...

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-14138

...

Release


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

It is hard to troubleshoot when all subtasks are always on the SCHEDULED status(just like the screenshot below) when users submit a job.

Proposed Changes

The most common reason for this problem is that vertex has applied for more resources than the cluster has. A pending Pending slots tab data could help users to check which vertex or subtask is blocked.

Frontend Design

Add the pending

...

status to the vertex node to show the pending reason.

Image Added

...

Image Removed

  • Add an icon like ‘?’ in the job status to suggest user to check it when a task in CREATED status for more than 30 seconds.

Image Removed

REST API Design

  • add ScheduledUnit for SlotRequest.
    • in SchedulerImpl.internalAllocateSlot, after allocationFuture set setPendingScheduledUnit(slotRequestId, scheduledUnit).
    • add requestPendingSlotRequests in scheduler.

/**

* Requests the pending slot requests.

*

* @param timeout for the operation

* @return the list of pending slot requests.

*/

CompletableFuture<Collection<PendingSlotRequest>> requestPendingSlotRequests(@RpcTimeout Time timeout);


  • add requestPendingSlotRequestDetails in JobMasterGateway.

/**

* Request the details of pending slot requests of the current job.

*

* @param timeout for the rpc call

* @return the list of pending slot requests.

*/

CompletableFuture<Collection<JobPendingSlotRequestDetail>> requestPendingSlotRequestDetails(@RpcTimeout Time timeout); 


  •  add JobPendingSlotRequestsHandler for rest.
    • url: /jobs/:jobid/pendingslotrequest
    • response:


{

  "pending-slot-requests" : {

    "type" : "array",

    "items" : {

      "type" : "object",

      "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:job:JobPendingSlotRequestDetail" 

      "properties" : {

        "vertex_id" : {

          "type" : "string"

        },

        "task_name" : {

          "type" : "string"

        },

        "slots" : {

          "type" : "array",

          "items" : {

            "type" : "object",

            "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:job:JobPendingSlotRequestDetail:SlotInfo" 

            "properties" : {

              "id" : {

                "type" : "string"

              },

              "start_time" : {

                "type" : "long"

              },

              "co-location_id" : {

                "type" : "string"

              },

              "sharing_id" : {

                "type" : "string"

              }

            }

          }

        }

      }

    }

  },

  "total" : {

    "type" : "integer"

  }

}


Test Plan

Covered by unit tests.