Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Created: Initial state of the scheduler
  • Waiting for resources: The required resources are declared. The scheduler waits until either the requirements are fulfilled or the set of resources has stabilised.
  • Executing: The set of resources is stable and the scheduler could decide on the parallelism with which to execute the job. The ExecutionGraph is created and the execution of the job has started.
  • Restarting: A recoverable fault has occurred. The scheduler stops the ExecutionGraph by canceling it.
  • Canceling: The job has been canceled by the user. The scheduler stops the ExecutionGraph by canceling it.
  • Failing: An unrecoverable fault has occurred. The scheduler stops the ExecutionGraph by canceling it.
  • Finished: The job execution has been completed.

In the states “created” and “Waiting for resources” there does not exist an ExecutionGraph. Only after we have acquired enough resources to run the job, the EG can be instantiated. Hence, all operations which require the EG will be ignored until we are in a state where an EG exists.

Since we have a couple of asynchronous operations (resource timeout in Waiting for resources state, restart delay in restarting) which only work if there hasn’t happened another state change, we need to introduce a state version which can be used to filter out outdated operations.

Components of the scheduler

The scheduler consists of the following components to accomplish its job:


PlantUML
@startuml

package "Declarative Scheduler" {
  [SlotAllocator]
  [ExecutionFailureHandler]
  [ScalingPolicy]
}

@enduml


SlotAllocator

ExecutionFailureHandler

...

ScalingPolicy

How to distinguish streaming jobs

...