Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add some clarifications

...

A simple strategy to fulfill unfulfilled requirements is to try to satisfy the requirements in a first come first serve fashion. Jobs which register their requirements first, will have precedence over other jobs also if the requirements change during the runtime. This approach is straight-forward and prevents issues where resources are distributed in such a way that no job has enough.

For the time being (and in order to keep the protocol simpler), the ResourceManager won’t revoke resources/slots which are assigned to a different job. This means that the ResourceManager will only assign free resources in order to fulfill the resource requirements. In a future version, we might think about letting the ResourceManager balance resources across jobs. This would entail that the ResourceManager may ask the JobMaster to release slots.

...

Currently, the SlotManager supports failing unfulfillable slot requests by calling ResourceActions.notifyAllocationFailure. A slot is unfulfillable if the SlotManager has neither allocated slots nor can allocate a slot from the ResourceManager. This works because we have individual slot requests which are identified by the AllocationID. With the declarative resource management, we cannot fail individual slot requests. However, we can let the JobMaster know if we cannot fulfill the minimum resource requirement for a job after resourcemanager.standalone.start-up-time has passed. In order to send this notification we have to introduce a new rpc JobMaster.notifyNotEnoughResources(AvailableResources availableResources). AvailableResources contains the set of available resources at the ResourceManager.

...

In the first version, the SlotPool will aggregate individual slot requests that are issued by the Scheduler into a ResourceRequirements  and announce them to the ResourceManager. Once a matching slot is returned the corresponding request future can be completed.

The new SlotManager will internally compute slot requests based on the difference between declared resource requirements, and then go similar code paths like the current version.

Lazy ExecutionGraph construction

...