Status
State: Draft
Discussion thread: https://lists.apache.org/thread.html/219e8cf818ae2803920a198d11e98c5e1aecd3c80cf774eb0cfe70eb@%3Cdev.airflow.apache.org%3E
JIRA: AIRFLOW-3081
Motivation
In the Apache Airflow project, the contributors have a need to run integration tests (integration with GCP) automatically, specifically before merging any of the pending changes to the main repository. Such integration tests are being worked on during the quest to add multiple GCP operators to Airflow. Currently integration tests are added during the GCF deploy/delete implementation in this pull request and in airflow-breeze (which is an easy to setup Dockerized environment for Airflow) in this pull request . Both PRs are in internal review stage and support running integration tests via command line, but this is not yet integrated in CI scripts.
The approach for contributing to Airflow (as described in CONTRIBUTING documentation) is to create your own fork with own copy of TravisCI project running unit tests automatically.
There are CI scripts and environment in Airflow, that allow Travis CI to run unit tests automatically, but there is no execution of integration tests nor any other tests that require communication with a real GCP project. It would be great improvement in quality of the library, if integration tests are executed automatically before any merge to main project. Running integration tests in this case mandate use of a shared GCP project and creating an appropriate service account that has necessary permissions to perform GCP operations. There are soon more than 30 operators for GCP to be added to Airflow (including GCE which allows to start/stop new machines). Virtually all of the operators could benefit from such automation of integration test execution.
Similar approach could be reused for other cloud operators, not only for GCP.
Considerations
Requirements/Constraints
For now we can focus only on GCP operators and later reuse the learnings for other clouds.
There will be many more GCP operators (not only GCF) sharing this project (and service account associated) is potentially dangerous if anyone can get credentials and use the service accounts. This means that fork repositories should use their own GCP projects and service accounts to setup Travis CI to use those for test executions.
There should be however a shared GCP project/service account that will be executed in against the main repository before the merge to master happens. That would be sanity check that could verify that there are no special/forgotten setup in the personal GCP projects that prevents those tests from running for others.
The tests in the main GCP projects should only be run after at least code review and possibly some kind of automated “vulnerability” inspections that could prevent approaches to abuse the GCP environment. Adversary attacks on open-source infrastructure had recently become a powerful hacking techniques as is recognised as a powerful vector of attacks as it is traditionally difficult to prevent - community/open-source projects are often rather relaxed about security, but they are used in sometimes millions commercial installations. Attacking open-source infrastructure is usually much simpler than attacking the commercial installation directly. The threat is real and is actively exploited. Some high-profile example is recent Gentoo repo hack. There is a nice short write-up about looming dangers in OS infrastructure,
Integration tests tend to run much slower than UI tests. There should be very few of those tests, but even if there are few they will take several minutes rather than seconds that is usual for unit tests
Proposed changes to the workflow/infrastructure
Possible even now, without special shared GCP project and big changes to the workflow:
Integration tests automation should be added as an extra step in CI job
It should be possible to run the CI job with or without integration tests
Integration tests execution should be conditional, based on whether credentials are properly setup in CI environment
There should be an easy way to create appropriate service account and generally make the GCP project ready to become target for Integration tests
Unit tests should run for all pushes to repo, but Integration tests should only be prerequisites for pull requests to become mergeable
Requires common GCP project/service account and workflow adaptation
Integration tests using shared credentials in main repository of Airflow should be only run after code from forks have been reviewed and approved but before merge happens - to verify that they will be runnable by everyone.