Status
State: Draft
Discussion thread:
JIRA:
Motivation
Operations performed on airflow server need to be run by a CLI which either interacts directly with the database or in json mode which performs a subset of operations via the existing experimental api. This complicates authentication and interacting with the airflow installation by requiring the CLI to have a database user, rather than a web user.
Requirements
Reduce complexity in handling authentication.
Provide a layer of abstraction using an industry standard interface to allow authenticated, remote control for Airflow installations
Any API should confirm to existing industry standards
API structure should be discoverable
API should be versioned to handle any future backwards incompatible schema changes.
Proposal
Develop an a JSON restful API using the existing Plugin interface, defined by OpenAPI 3.
Implementation
API Definition
The API will be defined by a YAML OpenAPI 3 definition, which will be exposed via connexion. For the time being, the API JSON data structures will follow the existing definitions as defined in https://airflow.apache.org/api.html.
This means that client libraries can be autogenerated meaning development work is focused on the API structure and method handlers themselves, and not the infrastructure or client libraries. This also opens airflow up to being easily controlled by many languages and interfaces with minimal work required on the server side.
The API base url will have the format {protocol}
://{airflowHost}
/api/v{apiVersion}
/, where:
protocol
is one ofhttp
orhttps
.airflowHost
isAIRFLOW__WEBSERVER__BASE_URL
.apiVersion
is a integer.
Endpoints
For the initial work, the existing endpoints should not be modified, so that a minor release is not required to get this structure into the code base
In the interest of maintaining backwards compatibility, the existing API would be maintained with the same structure at the same endpoints, but using the OpenAPI codebase until it would be deprecated in future releases. In parallel to this, an API would be defined at /api/v1
with the Additional API resources.
Existing
The base url will prefix all endpoints. There will be significantly more endpoints to cover airflow functionality, but this list covers the existing experimental with some extra intermediary endpoints.
/dags/{dag_id}/dag_runs
/dags/{dag_id}/dag_runs/{execution_date}
/test
/dags/{dag_id}/tasks/{task_id}
/dags/{dag_id}/dag_runs/{execution_date}/tasks/{task_id}
/dags/{dag_id}/paused/{paused}
/latest_runs
/pools
/pools/{pool_name}
Additional
These are suggested endpoints to replace existing ones as more in line with the concept of resources, rather than actions
/dags
/dags/{dag_id}
/dags/{dag_id}/dag_runs/{execution_date}/tasks
/dag_runs
- Alias to
/latest_runs
- Alias to
/dags/{dag_id}/tasks
/healthcheck
alias to
/test
/dags/{dag_id}/status
Intended to replace
/dags/{dag_id}/paused/{paused}
as a restful resource as opposed to the single field.
Discovery
To make the API discoverable, connexion provides a self-documenting API UI with swagger. Further to this, the API structure itself, should be discoverable implementing a concept such as HATEOAS via HAL.
Each resource or resource listing response should be defined in a structure defined in http://stateless.co/hal_specification.html.
NOTE: The exact structure of _links, and _curies are still to be defined.
Proof of concept
There is an example here of this implementation here https://github.com/apache/airflow/pull/4640/files
Important points to note are:
Using plugin entrypoint to isolate functionality https://github.com/apache/airflow/pull/4640/files#diff-2eeaed663bd0d25b7e608891384b7298R416
OpenAPI 3 definition https://github.com/apache/airflow/pull/4640/files#diff-93e827c54cbc441d84674c814dcae00e
Api Plugin and blueprint hooks https://github.com/apache/airflow/pull/4640/files#diff-5ff8468ade348aeb2ccc273cf3b79550
The sample documentation can be seen here:
Considerations
Using a structured definition like OpenAPI may restrict edge cases for complex, or non-json or non-rest based endpoints in the API.
Complex authentication methods may also be difficult to implement. The existing PoC above handles the authentication using the existing api authentication methods. In future, API keys or OAuth keys may be a better solution for API access, instead of requiring a session or an OAuth login.