Overview
In order to support the needs of a vibrant and growing community, we plan to continuously improve Airflow by
- Adding features already offered by existing workflow solutions (i.e we need to add expected features)
- Adding features that set Airflow apart from the herd (i.e we need to add killer features)
We count on input from the community in the form of JIRA issues to assess common pain-points. The greater the impact of a particular pain-point, the greater the need to focus the efforts of one or many developers on resolving it. Some pain points will be solved over several commits from several developers. Road Map items below are meant to capture pain-points that may span several reported issues. In order to ensure these Road Map items are addressed, each one needs a champion.
Champions are typically committers, but don't need to be. These Road Map Items require a commitment from the champion to corral the efforts of one or more developers to drive the development of the Road Map item. Similarly, a developer may contact the champion to be assigned work if he/she is interested in a particular area.
Roadmap
General
More Frequent Release Cycles - Champion : Max
Fault-isolation & dependency isolation
- via better packaging/execution - Champion : Max
Improving Testing
Documentation
Airflow is a broad platform and documentation is critical not only for getting new users up and running but also helping users discover and utilize all of Airflow's features.
- Add DAG Development Workflow - Champion : Sid
Feature Request: Why is my task not scheduled?
Python API
DAGs are code; the easier that code is to write, the better.
Rethink Start Date Handling - Champion : Jeremiah
- The start date is confusing and not consistently handled or exposed throughout the web app. If I recall correctly, in some places, the execution date and start date are reversed.
Remove Start_Date & Interval from the DAG and let them be set by a UI calendar widget
- This way, they will only be set to allowed values!
Remove Dependence on running everything in UTC or in a single TZ
Streamline workflow - Champion : Jeremiah
- See proposal
DAGRun-level triggers Champion : Jeremiah
- Would be helpful to tie Operators to DagRun state somehow, so they could act as a cleanup. For example, say a DAG begins by launching a cluster, then fails while trying to execute a command on the cluster. The cleanup Operator would make sure the cluster was properly shut down. This could be mimicked today with a "one_failed" trigger attached to every node in the DAG.
Execution
The heart and soul of Airflow.
- Fundamentally improving the scheduler
- Improving the scheduler by making dag runs more coherent
Ensure correct handling of Skipped tasks - Champion : Sid
depends_on_past=True and Skipped
: 1155 (Fixed)DAGs or Tasks that we would like to manually skip
as described in 262DAGs that specific only_run_latest
as in 59- In these cases, we want to skip DAG Runs except for the latest. Unfortunately, if we skip all but the latest, the latest will not run if
depends_on_past=True
- In these cases, we want to skip DAG Runs except for the latest. Unfortunately, if we skip all but the latest, the latest will not run if
Concurrency Limits Not Honored : max active, concurrency, pool
- Some overlap with the following item!
- https://github.com/airbnb/airflow/issues/1057 (pool over-subscription)
- https://github.com/airbnb/airflow/issues/1085 (concurrency not honored)
- Some overlap with the following item!
Backfill offers a parallel code path to scheduling - Champion : Jeremiah & Sid
- As such, it may not honor concurrent parameters
- https://github.com/airbnb/airflow/issues/1057 (pool over-subscription)
- https://github.com/airbnb/airflow/issues/1085 (concurrency not honored)
- Backfill should add DagRuns and defer to the scheduler.
- This will avoid incomplete duplication of logic between Backfill and Scheduler.
- This will also expose all runs in the DAGRuns view, not just the ones created/used by the scheduler
- As such, it may not honor concurrent parameters
Only Run Latest - Champion : Sid
- For cases where we need to only run the latest in a series of task instance runs and mark the others as skipped. For example, we may have job to execute a DB snapshot every day. If the DAG is paused for 5 days and then unpaused, we don’t want to run all 5, just the latest. With this feature, we will provide “cron” functionality for task scheduling that is not related to ETL
Backfill Oddity : needs one successful run!
Security
- Security
- Kerberize API, Webaccess - Champion: Bolke
- Role based access to Web (UI) - Champion: Chris
CLI
- CLI to use API
- Ability to delete DAGs
UI
Revamp Connections UI
- Increasingly, connections are putting fields in "extras", which works but means the correct fields are almost impossible to discover for new users. JDBCHook and GCloud hack the UI screen to show fields which are then automatically put into "extras", and that behavior should be supported more widely.
Ability to delete DAGs
Apache
- Move towards Apache-community Friendly Licensed Dependencies
- Survey all of our dependencies and ensure they meet with Apache licensing requirements : note Apache License Black List
Deprecated Features
- Break API compatibility on major releases in order to deprecate features