Status
Motivation
Measure performance changes between Airflow versions. Identify changes that have impact on performance, CPU and memory resources utilization. Improve transparency of performance changes in release notes of Apache Airflow.
Deliver Airflow users a tool to measure performance of their own installations and compare it across different setups or versions.
Considerations
What change do you propose to make?
The framework aims to measure performance and resources footprint of Airflow components.
The framework does not and should not measure performance or resources utilization of third party code (including operators being part of provided packages).
The tests should be executed as part of build process (TODO: how to plug it into the build pipeline) of Pull Requests.
The tests results should be included as the part of PR documentation (if possible).
The proposed framework is based on the following concepts:
- Instance - definition of Airflow installation setup.
- Performance DAG - definition of DAG that is executed during the test.
- Test suite - combination of Instance and Performance DAG. Test suite may include values for Instance or Performance DAG Jinja placeholders.
Instance
Instance is a concept that defines Airflow setup.
Instance is configured using JSON files. Schema of the file is specific to Instance type. Therefore the only required field in each Instance configuration file is instance_type
. That field is used by the framework to pick dedicated test execution solution (e.g. Docker Compose, Cloud Composer, GKE, MWAA, etc.).
The remaining fields of the instance configuration may include information such as:
- number of schedulers
- number of CPUs in schedulers
- worker CPU or memory (for Celery workers)
- machine types
Instance configuration file can include Jinja style placeholders that are populated in Test suite.
Test suite
Test suite is represented by a configuration file represented with JSON or YAML file. It includes the following attributes:
instance
- object defining the instance configuration used in the test run.instance_specification_path
- path to the instance definition file.args
- map of variables values replacing Jinja placeholders of the instance configuration file.
performance_dag
- object defining performance DAG running in the test run.performance_dag_specification_path
- path to the Performance DAG specification file.dag_file_path
- alternatively provide path to DAG file (Python code).args
- map of variables values replacing Jinja placeholders of the performance DAG configuration file.
attempts
- number of attempts the file should be run. The reason for having multiple attempts is to take multiple measurements in order to increase results stability. Airflow instance by default is torn down and recreated between attempts.
Consideration: Test suite can include list of instance
and list of performance_dag
objects to run all combinations of those attempts
times. Therefore having 3 instance
objects and 4 performance_dag
objects and attempts = 5
will produce 12 combinations and execute each of them 5 times resulting in 60 runs.
Running tests
The tests are executed for each Test suite.
Test runner script may include the following parameters:
test_suite-path
- path of the Test suite definition file.reuse-if-exists
- the flag will tell the script to re-use Instance if exists or create new otherwise. If the flag has not been set and the Instance exist the script is expected to fail.delete-if-exists
- this flag will cause the script to delete an existing environment with the same name as the one specified in your specification json file and then recreate it.keep-on-finish
- this flag will leave the instances created for the test after the tests completes. By default instances created during the tests are deleted.output-path
- name of folder to store the output results to.dry-run
- test the configuration
Test run is a state machine that walks through the following steps:
- prepare the instance (either by reusing or deleting an existing one)
- generate and upload copies of Performance dag
- wait for all expected Performance DAGs to be parsed by the scheduler
- unpause the Performance DAGs
- wait for all the expected Performance Dag Runs to finish (with either a success or failure)
- write the performance metrics in a csv format to the results table
- delete the environment (if
--delete-upon-finish
flag was set).
Creating Airflow instance is delegated to dedicated class handling given instance_type
. Initially the following Instance handlers will implemented:
gke
- creates Airflow instance in Google Kubernetes Engine.composer
- creates Airflow instance with Google Cloud Composer.docker
- creates Airflow instance with Docker Compose.
Additional Instance providers can be included in the test framework and mapped to correspondinginstance_type
value. The list above comes with a package from Cloud Composer team initial code contribution.
Results format
Test results store the following information in a folder created for the test run. The folder name is provided as an argument to the test runner script. It could be considered to name the folder after the Test suite name and date but the run can include additional parameters populating Jinja variables.
The resulting folder includes the following files:
- Instance configuration file with resolved Jinja variables values.
- Performance DAG configuration file with resolved Jinja variables values.
- Test suite arguments - full list of arguments used to run the suite including default values applied.
- Test result metrics CSV file.
Tests results metrics CSV file has following columns:
scheduler_memory_average
- average utilization of Scheduler memoryscheduler_memory_max
- max utilization of Scheduler memoryscheduler_memory_average
- average utilization of Scheduler memoryscheduler_memory_max
- max utilization of Scheduler memorytest_start_date
- earlieststart_date
of any Dag Runs created as part of the test
test_end_date
- latestend_date
of any Dag Runs created as part of the testtest_duration
- the difference betweentest_end_date
andtest_start_date
in secondsdag_run_total_count
- total amount of test Dag Runsdag_run_success_count
- amount of test Dag Runs that finished with a successdag_run_failed_count
- amount of test Dag Runs that finished with a failuredag_run_average_duration
- average duration of test Dag Runs, where duration is calculated as difference between Dag Run'send_date
andstart_date
.dag_run_min_duration
- minimal duration of any of test Dag Runsdag_run_max_duration
- maximal duration of any of test Dag Runstask_instance_total_count
- total amount of Task Instances belonging to test Dag Runstask_instance_average_duration
- averageduration
of test Task Instancestask_instance_min_duration
- minimalduration
of any of test Task Instancestask_instance_max_duration
- minimalduration
of any of test Task Instances
What problem does it solve?
- It makes performance changes visible to Airflow users.
- It also helps identifying changes that have unacceptably negative impact on Airflow performance or resources utilization.
- It helps the users to adopt to new releases requirements by adding more resources to their Airflow configurations if necessary which has positive impact on new versions adoptions.
- Lastly it sends the message that Airflow community takes performance seriously and validates the solution against important metrics.
Why is it needed?
Currently there are no performance tests running for Apache Airflow.
There is no visibility of performance changes and resources utilization (especially when it grows).
Adoption of new versions is exposed to unexpected failures and has negative impact on stability of Airflow.
The tests can be used both during Airflow builds but also by the users who can individually test performance with their own setup.