Status
State: Draft
Discussion thread: https://lists.apache.org/thread.html/219e8cf818ae2803920a198d11e98c5e1aecd3c80cf774eb0cfe70eb@%3Cdev.airflow.apache.org%3E
JIRA: AIRFLOW-3081
Motivation
In the Apache Airflow project, the contributors have a need to run system tests with external systems (for example Google Cloud Platform) automatically. Specifically before merging any of the pending changes to the main repository.
We already have a community-shared way to run unit tests automatically for Apache Airflow. The approach for contributing to Airflow (as described in CONTRIBUTING documentation) is to create your own fork with own copy of TravisCI project running unit tests automatically.
There are CI scripts and environment in Airflow, that allow Travis CI to run unit tests automatically, but there is no execution of system tests nor any other tests that require communication with a real external system such as Google Cloud Platform project.
But there is no way currently (in a way shared with the Community) to run such System Tests automatically.
Example of such System Test DAGs are those developed during development of Google Cloud Platform operators (this is currently in CLOUD_BUILD branch which will hopefully soon be merged to master):
- Google Compute Engine operator examples - including Instance Group Management
- Google Cloud Function operator examples
- Google Cloud Spanner operator examples
- Google Cloud SQL operator examples - including Google Cloud SQL Query operator
- Google Cloud Storage ACL operator examples
- Google Cloud Bigtable operator examples
Those DAGs are used for two purposes:
- they are used as example documentation sources. For example the documentation of Google Compute Environment operators is generated using the examples.
- they are actually runnable examples - providing that the environment variables are configured properly and authentication works.
The tests can be run through airflow and they should succeed by performing full lifecycle of the service in question (Compute Instance, Cloud Function etc.). Running those examples have been wrapped in unit-tests-like system test classes that are ignored by default but when proper variables are set, they can be run automatically. They also have helpers that allow to setup and teardown costly environment for such service tests automatically.
- Compute System Test and Compute System Test Helper
- Cloud Function System Test
- Spanner System Test and Spanner System Test Helper
- Cloud SQL System Test and Cloud SQL System Test Helper
- Cloud SQL Query System Test and Cloud SQL Query System Test Helper
- Cloud Storage ACL System Test
- BigTable System Test and BigTable system Test Helper
As part of the Google Cloud Operators implementation, also a Cloud Build configuration was implemented that allows to run all the System Tests automatically. Using a privately owned/billed Google Cloud Platform project. Such build requires also an integration with Airflow Breeze Development environment which was developed for this specific purpose - to help with faster development of Google Cloud related operators. Design of the Breeze environment is here and it covers two usages for the environment - support for Cloud Build but also support for local development workflow which might become the base for or be merged with AIP-7 Simplified development workflow work.
It would be great improvement in quality, if we can have such system tests executed automatically before any merge to main project.
Running System Tests for Google Cloud Platforms mandates use of a Google Cloud Platform project with billing enabled and creating an appropriate service accounts that have necessary permissions to perform those operations. This can be either a private account of developer/team developing the operators, or eventually Apache Airflow community could have a shared GCP project to run such tests before merge automatically on approved pull requests.
Similar approach could be reused for other cloud/external service operators, not only for Google Cloud Platform.
Considerations
Requirements/Constraints
For now we can focus only on Google Cloud Platform operators and later reuse the learnings for other clouds/external services.
There are several services (and more coming) for Google Cloud Platform sharing this project (and service account(s) associated) is potentially dangerous if anyone can get credentials and use the service accounts. This means that forked/private repositories should use their own GCP projects and service accounts to setup Travis CI to use those for test executions. This should be configurable but easy to share in the team working on the same fork.
Eventually a shared GCP project/service account that might be used to run tests for the main repository before the merge to master happens. That would be sanity check that could verify that there are no special/forgotten setup in the personal GCP projects that prevents those tests from running for others.
The tests in the main GCP projects should only be run after at least code review and possibly some kind of automated “vulnerability” inspections that could prevent approaches to abuse the GCP environment. Adversary attacks on open-source infrastructure had recently become a powerful hacking techniques as is recognised as a powerful vector of attacks as it is traditionally difficult to prevent - community/open-source projects are often rather relaxed about security, but they are used in sometimes millions commercial installations. Attacking open-source infrastructure is usually much simpler than attacking the commercial installation directly. The threat is real and is actively exploited. Some high-profile example is recent Gentoo repo hack. There is a nice short write-up about looming dangers in OS infrastructure,
System tests tend to run much slower than UI tests. There should be very few of those tests, but even if there are few they will take several minutes rather than seconds that is usual for unit tests
Proposed changes to the workflow/infrastructure
Possible even now, without special shared Google Cloud Project and big changes to the workflow:
System Tests automation as implemented with Airflow Breeze can be run by anyone who has a billable GCP project
Cloud Build integration with GCP is an optional step - only if the team working on their fork have your own GCP project and setup Cloud Build Integration
System Tests execution is already conditional and disabled by default, unless credentials are properly setup for Cloud Build
There is already a bootstrapping process that creates appropriate service accounts and sets up the GCP project to be able to run the tests automatically
System tests should only be prerequisites for pull requests to become mergeable because it takes a lot of time and resources to run them
Requires common GCP project/service account and workflow adaptation
System tests using shared credentials in main repository of Airflow should be only run after code from forks have been reviewed and approved but before merge happens - to verify that they will be runnable by everyone.