Status
State: Draft
Discussion thread:
JIRA:
Motivation
Airflow runs arbitrary code across different workers. Currently, every task has full access to the Airflow database including connection details like usernames, passwords etc. This makes it quite hard to deploy Airflow in environments that are multi-tenant or semi-multi-tenant. Next to that there is no mechanism in place that ensures that what the scheduler thinks it is scheduling is also the thing that is running at the worker. This creates to additional operational risk of running an out of date task that does something else than expected, aside from the security risk of a malicious task.
Challenges
DAGs can be generated by DAG factories essentially creating a kind of sub-DSL in which DAGs are defined. From Airflow’s point of view this creates a challenge as DAGs therefore are able to pull in arbitrary dependencies.
Envisioned workflow
DAG submission
The DAG directory should be fully managed by Airflow. DAGs should be submitted by authorized users and ownership