You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Motivation

By default, users launch one scheduler instance for Airflow. This brings up a few concerns, including

  • High Availability: what if the single scheduler is down.
  • Scheduling Performance: the scheduling latency for each DAG may be long if there are many DAGs.


It would be ideal for Airflow to support multiple schedulers, to address these concerns.

Considerations

1. `scheduler_lock` is already there in DagModel, but it's not used in current implementation of Airflow (as of now, https://github.com/apache/airflow/tree/45d24e79eab98589b1b0509e920811cbf778048b). We should leverage it and modify the scheduler code accordingly.

2. To avoid the leader-selection issue, we may not want to use master-slave architecture for schedulers. Instead, we simply start multiple schedulers. The probability of schedulers competing is low (https://lists.apache.org/thread.html/389287b628786c6144c0b8e6abf74a040890cd9410a5abe6e968eb55@%3Cdev.airflow.apache.org%3E)

3. To avoid the "correlation" between schedulers, we may want to consider random sort list of DAG files before it's passed to scheduler process (https://lists.apache.org/thread.html/e21d028944092b588295112acb9a3e203c4aea7fae50978f288c2af1@%3Cdev.airflow.apache.org%3E)

4. One important scope of this AIP is to intensively test whether running multiple schedulers would cause any issue (after all concerns above are addressed).

  • No labels