Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Looking at the original AIP-15 the author proposes to use locking to enable the use of multiple schedulers, this might introduce unnecessary complexity. Because of this I propose to split the scheduler into MainScheduler and DagScheduler. This makes it possible to have multiple DagSchedulers running that are submitted by the MainScheduler.


Benefits

Each DAG will get their own scheduler on demand. When having multiple DAGs multiple DagSchedulers can run at the same time. The load on the MainScheduler will be reduced a lot. The MainScheduler will not be the blocking process of Airflow anymore.

Processes

MainScheduler

This process should always run, like the current scheduler.

This process can be master/failover or this can be solved within k8s as discussed in AIP-15 Support Multiple-Schedulers for HA & Better Scheduling Performance.

Tasks

  • DAG syncing to database
    • This might be also separated into an other daemonized process. This should be configurable to include or exclude from the MainScheduler.
  • Submitting DagScheduler
    • Only DagModel and DagRun table should be required here. The DAG object is not required here.
    • Conditions to submit a DagScheduler:
      • If there is a running DagRun
      • If a new DagRun should be scheduled
      • No DagScheduler is already active for this DAG

DagScheduler

This process is only executed on demand and is executed by an Airflow executor, for example on a Celery worker.

This DagScheduler should only execute a single DAG and a single cycle. If a cycle is done the MainScheduler should schedule a new DagScheduler.

Tasks

  • Create a DagRun when required
  • Check running TaskInstances and submit new TaskInstances when required
  • Set status of a DagRun to Success or failed when required