You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

Status

State: Draft

Discussion thread: https://lists.apache.org/thread.html/34932447daee4794be4765eff851de9fd68abf63a91ac81f9db1ef7a@%3Cdev.airflow.apache.org%3E

JIRA: Unable to render Jira issues macro, execution error.

Motivation

Currently, the workflow for submitting new contributions is very cumbersome. Just the process of running unit tests is a difficult process, be it that you want to run them locally or through creating your own TravisCI pipeline. Airflow tests depend on many external services and other custom setup, which makes it hard for contributors and committers to work on this codebase. CI builds have also been unreliable, and it is hard to reproduce the causes. Having contributors trying to emulate the build environment every time makes it easier to get to an "it works on my machine" sort of situation.

The goal of this proposal is to outline the work needed to make local testing significantly easier and standardise the best practices to contribute to the Airflow project.


Considerations

Requirements / Constraints

  • TravisCI unit tests should be reproducible locally
  • Integration tests local reproducibility is optional for now to keep this simple
  • Extensive documentation is required
  • Integration tests are difficult to run in a local environment, given their intrinsic coupling of some of them with cloud services. See AIP-4 Support for System Tests for external systems for an example of this.

  • The current environment setup on TravisCI (the setup before the tests are run) takes around 4 minutes. Maybe some things can be optimised  by reducing the docker image size. But it seems that right now, the docker image size is hard to reduce, given we pre-install Hadoop, Hive and MiniCluster.

Proposed changes to the workflow / infrastructure

  • Creation of a separate incubator-airflow-ci repo, where a CI/dev base image with all dependencies is built. This has been done already in apache/incubator-airflow-ci
  • Setting up docker-compose for container orchestration and configuration.
    • This simplifies the setup of services like MySQL, PostgreSQL, OpenLDAP, krb5 and rabbitmq which are needed for both running Airflow and running some unit and integration tests. 
    • The same setup should allow us to add further service dependencies as needs arise
    • The initial work has been submitted and merged already in apache/incubator-airflow/pull/3393
  • MiniCluster should be moved to it's own image and orchestrated through the docker-compose setup
  • Strip out Tox and fully rely on our docker setup
  • Bake build script in CI docker image
  • Current image sizes should be reduced to the bare minimum for speed (Optimizing Docker Image Workflow)
  • Current Kubernetes CI scripts should be run on GKE instead via minikube (Kubernetes Testing: Using GKE instead of Minikube)
  • Create a developer guide

References


  • No labels