Status
Current state: [Under Discussion]
Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html
JIRA: Jira server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key FLINK-14138
Released: <Flink Version>
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
It is hard to troubleshoot when all subtasks are always on the SCHEDULED status(just like the screenshot below) when users submit a job.
Proposed Changes
The most common reason for this problem is that vertex has applied for more resources than the cluster has. A pending slots tab could help users to check which vertex or subtask is blocked.
...
- add ScheduledUnit for SlotRequest.
- in SchedulerImpl.internalAllocateSlot, after allocationFuture set setPendingScheduledUnit(slotRequestId, scheduledUnit).
- add requestPendingSlotRequests in scheduler.
/** * Requests the pending slot requests. * * @param timeout for the operation * @return the list of pending slot requests. */ CompletableFuture<Collection<PendingSlotRequest>> requestPendingSlotRequests(@RpcTimeout Time timeout); |
- add requestPendingSlotRequestDetails in JobMasterGateway.
/** * Request the details of pending slot requests of the current job. * * @param timeout for the rpc call * @return the list of pending slot requests. */ CompletableFuture<Collection<JobPendingSlotRequestDetail>> requestPendingSlotRequestDetails(@RpcTimeout Time timeout); |
- add JobPendingSlotRequestsHandler for rest.
- url: /jobs/:jobid/pendingslotrequest
- response:
{ "pending-slot-requests" : { "type" : "array", "items" : { "type" : "object", "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:job:JobPendingSlotRequestDetail" "properties" : { "vertex_id" : { "type" : "string" }, "task_name" : { "type" : "string" }, "slots" : { "type" : "array", "items" : { "type" : "object", "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:job:JobPendingSlotRequestDetail:SlotInfo" "properties" : { "id" : { "type" : "string" }, "start_time" : { "type" : "long" }, "co-location_id" : { "type" : "string" }, "sharing_id" : { "type" : "string" } } } } } } }, "total" : { "type" : "integer" } } |
Test Plan
Covered by unit tests