Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add test plan

Status

Current state[Under Discussion]

Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html

JIRA:  

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-14138

Released: <Flink Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

It is hard to troubleshoot when all subtasks are always on the SCHEDULED status(just like the screenshot below) when users submit a job.

Proposed Changes

The most common reason for this problem is that vertex has applied for more resources than the cluster has. A pending slots tab could help users to check which vertex or subtask is blocked.

...

  • add ScheduledUnit for SlotRequest.
    • in SchedulerImpl.internalAllocateSlot, after allocationFuture set setPendingScheduledUnit(slotRequestId, scheduledUnit).
    • add requestPendingSlotRequests in scheduler.

/**

* Requests the pending slot requests.

*

* @param timeout for the operation

* @return the list of pending slot requests.

*/

CompletableFuture<Collection<PendingSlotRequest>> requestPendingSlotRequests(@RpcTimeout Time timeout);


  • add requestPendingSlotRequestDetails in JobMasterGateway.

/**

* Request the details of pending slot requests of the current job.

*

* @param timeout for the rpc call

* @return the list of pending slot requests.

*/

CompletableFuture<Collection<JobPendingSlotRequestDetail>> requestPendingSlotRequestDetails(@RpcTimeout Time timeout); 


  •  add JobPendingSlotRequestsHandler for rest.
    • url: /jobs/:jobid/pendingslotrequest
    • response:


{

  "pending-slot-requests" : {

    "type" : "array",

    "items" : {

      "type" : "object",

      "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:job:JobPendingSlotRequestDetail" 

      "properties" : {

        "vertex_id" : {

          "type" : "string"

        },

        "task_name" : {

          "type" : "string"

        },

        "slots" : {

          "type" : "array",

          "items" : {

            "type" : "object",

            "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:job:JobPendingSlotRequestDetail:SlotInfo" 

            "properties" : {

              "id" : {

                "type" : "string"

              },

              "start_time" : {

                "type" : "long"

              },

              "co-location_id" : {

                "type" : "string"

              },

              "sharing_id" : {

                "type" : "string"

              }

            }

          }

        }

      }

    }

  },

  "total" : {

    "type" : "integer"

  }

}


Test Plan

Covered by unit tests