You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Status

Current state[Under Discussion]

Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html

JIRA:   Unable to render Jira issues macro, execution error.

Released: <Flink Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

It is hard to troubleshoot when all subtasks are always on the SCHEDULED status(just like the screenshot below) when users submit a job.

Proposed Changes

The most common reason for this problem is that vertex has applied for more resources than the cluster has. A pending slots tab could help users to check which vertex or subtask is blocked.

Frontend Design

  • Add the pending slots tab to display all pending slots info.

  • Add an icon like ‘?’ in the job status to suggest user to check it when a task in CREATED status for more than 30 seconds.

REST API Design

  • add ScheduledUnit for SlotRequest.
    • in SchedulerImpl.internalAllocateSlot, after allocationFuture set setPendingScheduledUnit(slotRequestId, scheduledUnit).
    • add requestPendingSlotRequests in scheduler.

/**

* Requests the pending slot requests.

*

* @param timeout for the operation

* @return the list of pending slot requests.

*/

CompletableFuture<Collection<PendingSlotRequest>> requestPendingSlotRequests(@RpcTimeout Time timeout);


  • add requestPendingSlotRequestDetails in JobMasterGateway.

/**

* Request the details of pending slot requests of the current job.

*

* @param timeout for the rpc call

* @return the list of pending slot requests.

*/

CompletableFuture<Collection<JobPendingSlotRequestDetail>> requestPendingSlotRequestDetails(@RpcTimeout Time timeout); 


  •  add JobPendingSlotRequestsHandler for rest.
    • url: /jobs/:jobid/pendingslotrequest
    • response:


{

  "pending-slot-requests" : {

    "type" : "array",

    "items" : {

      "type" : "object",

      "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:job:JobPendingSlotRequestDetail" 

      "properties" : {

        "vertex_id" : {

          "type" : "string"

        },

        "task_name" : {

          "type" : "string"

        },

        "slots" : {

          "type" : "array",

          "items" : {

            "type" : "object",

            "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:job:JobPendingSlotRequestDetail:SlotInfo" 

            "properties" : {

              "id" : {

                "type" : "string"

              },

              "start_time" : {

                "type" : "long"

              },

              "co-location_id" : {

                "type" : "string"

              },

              "sharing_id" : {

                "type" : "string"

              }

            }

          }

        }

      }

    }

  },

  "total" : {

    "type" : "integer"

  }

}


Test Plan

Covered by unit tests.


  • No labels