Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In order to support the reactive mode (FLIP-159) we need a different type of scheduler which does not first decide first announces the required resources and only after having received the resources decides on the actual parallelism with which to execute a job and then asking for the resources but doing it vice versathe job. This has the benefit that this scheduler can schedule jobs if not all required resources are fulfilled.

Proposed Changes

The declarative scheduler will first work for streaming jobs only. This will simplify things considerably because there is only a single job pipelined region. Moreover, we could treat task failures as global failures which restart the whole topology. This is true for many streaming topologies anyways. Given these assumptions we could develop the following scheduler:

...

  • Global failover: Restart of the whole topology which allows to change the parallelism of the Jobjob
  • Local failover: Restart of a subset of the executions which does not change the parallelism of the operator

If the system cannot recover from a local failover because it does not have enough slots available, it must be escalated which makes it a global failover. A global failover will allow the system to rescale the whole job.

...