You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Status

StateDraft
Discussion Thread

Multiple Schedulers - "scheduler_lock"

A Naive Multi-Scheduler Architecture Experiment of Airflow

Created

$action.dateFormatter.formatGivenString("yyyy-MM-dd", $content.getCreationDate())

Motivation

By default, users launch one scheduler instance for Airflow. This brings up a few concerns, including

  • High Availability: what if the single scheduler is down.
  • Scheduling Performance: the scheduling latency for each DAG may be long if there are many DAGs.


It would be ideal for Airflow to support multiple schedulers, to address these concerns.

There was a "hacky" method to start multiple schedulers and let each handles a specific set of DAGs. It does improve scheduling performance, but doesn't address HA concern.

Considerations

1. `scheduler_lock` is already there in DagModel, but it's not used in current implementation of Airflow (as of now, https://github.com/apache/airflow/tree/45d24e79eab98589b1b0509e920811cbf778048b). We should leverage  it and modify the scheduler code accordingly.

2. To avoid the leader-selection problem, we may not want to use master-slave architecture for schedulers. Instead, we simply start multiple schedulers.

The probability of schedulers competing on the same DAG is easy to calculate since it's a typical Birthday Problem, and it is reasonably low if # of DAGs/ # of schedulers is not too low (the probability that there are schedulers competing on the same DAG is 1-m!/((m-n)! * (m^n))  , m is the number of DAGs and n is the number of schedulers).


Let’s say we have 200 DAGs and we start 2 schedulers. At any moment, the probability that there is schedulers competing on the same DAG is only 0.5%. If we run 2 schedulers against 300 DAGs, this probability is only 0.33%.(https://lists.apache.org/thread.html/389287b628786c6144c0b8e6abf74a040890cd9410a5abe6e968eb55@%3Cdev.airflow.apache.org%3E)

3. To avoid the "correlation" between schedulers, we may want to consider random sort list of DAG files before it's passed to scheduler process (https://lists.apache.org/thread.html/e21d028944092b588295112acb9a3e203c4aea7fae50978f288c2af1@%3Cdev.airflow.apache.org%3E)

4. Another method to avoid schedulers competing with each other is to let scheduler look select the DAG that's not been processed for the longest time that is not locked (https://lists.apache.org/thread.html/6021f5f8324dd7e7790b0b1903e3034d2325e21feba5aef15084eb17@%3Cdev.airflow.apache.org%3E).

5. A few points to keep in mind when we manage scheduler_lock:

6. One important scope of this AIP is to intensively test whether running multiple schedulers would cause any issue (after all concerns above are addressed).

  • No labels