Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

...

Page properties


Discussion thread

...

JIRA: _

...

ASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-10404

Release1.12


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Currently, the SlotManager supports failing unfulfillable slot requests by calling ResourceActions.notifyAllocationFailure. A slot is unfulfillable if the SlotManager has neither allocated slots nor can allocate a slot from the ResourceManager. This works because we have individual slot requests which are identified by the AllocationID. With the declarative resource management, we cannot fail individual slot requests. However, we can let the JobMaster know if we cannot fulfill the resource requirement for a job after resourcemanager.standalone.start-up-time has passed. In order to send this notification we have to introduce a new rpc RPC JobMaster.notifyNotEnoughResources(AvailableResources availableResourcesCollection<ResourceRequirement> acquiredResources). AvailableResources contains acquiredResources is the set collection of available acquired resources at for the ResourceManagerjob.

This signal is sent whenever the SlotManager tried to fulfill the requirements for a job but failed to do so.

...

Code Block
titleSlotManager interface extension
interface SlotManager {
	/**
	 * Process the given resource requirements. The resource requirements define the
     * required resources for the specified job. The SlotManager will try to fulfill
     * these requirements.
     *
     * @param resourceRequirements resourceRequirements defines the resource requirements for a job
	 */
	void processResourceRequirements(ResourceRequirements resourceRequirements);
}


In order to enable the SlotManager to notify the JobMaster about not enough resources, we need to extend the JobMasterGateway with an additional method:

Code Block
languagejava
titleJobMasterGateway interface extension
interface JobMasterGateway {
  /**
   * Notifies that not enough resources are available to fulfill the resource requirements of a job.
   *
   * @param acquiredResources the resources that have been acquired for the job
   */
   void notifyNotEnoughResourcesAvailable(Collection<ResourceRequirement> acquiredResources);
}


Accepting resources

On the JobMaster side, the SlotPool is responsible for accepting offered slots, and matching these against the requirements of the job. It has to follow the same logic for matching slots as the SlotManager.

...

If the SlotPool is provided with more slots than are currently required, then it will reject return these slots after the idle slot timeout has passed. This serves as a sort of grace period, potentially allowing us to make use of excessive slots later on without having to do another round-trip to the ResourceManager.


Note: Depending on the scheduling requirements it might make sense to reuse slots which have been freed on the JobMaster because it reduces latency or to return them and to ask for properly sized slots because it improves resource utilization (assuming different resource requirements). At the moment, we assume that reusing slots is possible. In the future we might have to make this behaviour configurable.

Releasing resources

Resources/Slots are released by the JobMaster by calling TaskExecutorGateway.freeSlot() and by updating the required resources by calling ResourceManagerGateway.declareRequiredResources with the updated resource requirements.

...

The slotmanager.request-timeout option will no longer have an effect.

Follow ups

Removing the AllocationID

Once the old SlotPool implementations are removed, it might be possible to remove the AllocationID and to identify slots via the SlotID.