[WIP] Performance tests framework

Status

State	Draft
Discussion Thread
Vote Thread
Vote Result Thread
Progress Tacking (PR/GitHub Project/Issue Label)
Date Created	$action.dateFormatter.formatGivenString("yyyy-MM-dd", $content.getCreationDate())
Version Released
Authors	Bartosz Jankiewicz

Motivation

Measure performance changes between Airflow versions. Identify changes that have impact on performance, CPU and memory resources utilization. Improve transparency of performance changes in release notes of Apache Airflow.

Deliver Airflow users a tool to measure performance of their own installations and compare it across different setups or versions.

Considerations

What change do you propose to make?

The framework aims to measure performance and resources footprint of Airflow components.

The framework does not and should not measure performance or resources utilization of third party code (including operators being part of provided packages).

The tests should be executed as part of build process (TODO: how to plug it into the build pipeline) of Pull Requests.

The tests results should be included as the part of PR documentation (if possible).

The proposed framework is based on the following concepts:

Instance - definition of Airflow installation setup.
Performance DAG - definition of DAG that is executed during the test.
Test suite - combination of Instance and Performance DAG. Test suite may include values for Instance or Performance DAG Jinja placeholders.

Instance

Instance is a concept that defines Airflow setup.

Instance is configured using JSON files. Schema of the file is specific to Instance type. Therefore the only required field in each Instance configuration file is instance_type. That field is used by the framework to pick dedicated test execution solution (e.g. Docker Compose, Cloud Composer, GKE, MWAA, etc.).

The remaining fields of the instance configuration may include information such as:

number of schedulers
number of CPUs in schedulers
worker CPU or memory (for Celery workers)
machine types

Instance configuration file can include Jinja style placeholders that are populated in Test suite.

Test suite

Test suite is represented by a configuration file represented with JSON or YAML file. It includes the following attributes:

instance - object defining the instance configuration used in the test run.
- instance_specification_path - path to the instance definition file.
- args - map of variables values replacing Jinja placeholders of the instance configuration file.
performance_dag - object defining performance DAG running in the test run.
- performance_dag_specification_path - path to the Performance DAG specification file.
- dag_file_path - alternatively provide path to DAG file (Python code).
- args - map of variables values replacing Jinja placeholders of the performance DAG configuration file.
attempts - number of attempts the file should be run. The reason for having multiple attempts is to take multiple measurements in order to increase results stability. Airflow instance by default is torn down and recreated between attempts.

Consideration: Test suite can include list of instance and list of performance_dag objects to run all combinations of those attempts times. Therefore having 3 instance objects and 4 performance_dag objects and attempts = 5 will produce 12 combinations and execute each of them 5 times resulting in 60 runs.

Running tests

The tests are executed for each Test suite.

Test runner script may include the following parameters:

test_suite-path - path of the Test suite definition file.
reuse-if-exists - the flag will tell the script to re-use Instance if exists or create new otherwise. If the flag has not been set and the Instance exist the script is expected to fail.
delete-if-exists - this flag will cause the script to delete an existing environment with the same name as the one specified in your specification json file and then recreate it.
keep-on-finish - this flag will leave the instances created for the test after the tests completes. By default instances created during the tests are deleted.
output-path - name of folder to store the output results to.
dry-run - test the configuration

Test run is a state machine that walks through the following steps:

prepare the instance (either by reusing or deleting an existing one)
generate and upload copies of Performance dag
wait for all expected Performance DAGs to be parsed by the scheduler
unpause the Performance DAGs
wait for all the expected Performance Dag Runs to finish (with either a success or failure)
write the performance metrics in a csv format to the results table
delete the environment (if --delete-upon-finish flag was set).

Creating Airflow instance is delegated to dedicated class handling given instance_type. Initially the following Instance handlers will implemented:

gke - creates Airflow instance in Google Kubernetes Engine.
composer - creates Airflow instance with Google Cloud Composer.
docker - creates Airflow instance with Docker Compose.
Additional Instance providers can be included in the test framework and mapped to corresponding instance_type value. The list above comes with a package from Cloud Composer team initial code contribution.

Results format

Test results store the following information in a folder created for the test run. The folder name is provided as an argument to the test runner script. It could be considered to name the folder after the Test suite name and date but the run can include additional parameters populating Jinja variables.

The resulting folder includes the following files:

Instance configuration file with resolved Jinja variables values.
Performance DAG configuration file with resolved Jinja variables values.
Test suite arguments - full list of arguments used to run the suite including default values applied.
Test result metrics CSV file.

Tests results metrics CSV file has following columns:

scheduler_memory_average - average utilization of Scheduler memory
scheduler_memory_max - max utilization of Scheduler memory
scheduler_memory_average - average utilization of Scheduler memory
scheduler_memory_max - max utilization of Scheduler memory
test_start_date - earliest start_date of any Dag Runs created as part of the test

test_end_date - latest end_date of any Dag Runs created as part of the test
test_duration - the difference between test_end_date and test_start_date in seconds
dag_run_total_count - total amount of test Dag Runs
dag_run_success_count - amount of test Dag Runs that finished with a success
dag_run_failed_count - amount of test Dag Runs that finished with a failure
dag_run_average_duration - average duration of test Dag Runs, where duration is calculated as difference between Dag Run's end_date and start_date.
dag_run_min_duration - minimal duration of any of test Dag Runs
dag_run_max_duration - maximal duration of any of test Dag Runs
task_instance_total_count - total amount of Task Instances belonging to test Dag Runs
task_instance_average_duration - average duration of test Task Instances
task_instance_min_duration - minimal duration of any of test Task Instances
task_instance_max_duration - minimal duration of any of test Task Instances

What problem does it solve?

It makes performance changes visible to Airflow users.
It also helps identifying changes that have unacceptably negative impact on Airflow performance or resources utilization.
It helps the users to adopt to new releases requirements by adding more resources to their Airflow configurations if necessary which has positive impact on new versions adoptions.
Lastly it sends the message that Airflow community takes performance seriously and validates the solution against important metrics.

Why is it needed?

Currently there are no performance tests running for Apache Airflow.

There is no visibility of performance changes and resources utilization (especially when it grows).

Adoption of new versions is exposed to unexpected failures and has negative impact on stability of Airflow.

The tests can be used both during Airflow builds but also by the users who can individually test performance with their own setup.

Space shortcuts

Page tree

Status

Motivation

Considerations

What change do you propose to make?

Instance

Test suite

Running tests

Results format

What problem does it solve?

Why is it needed?

Are there any downsides to this change?

Which users are affected by the change?

How are users affected by the change? (e.g. DB upgrade required?)

Other considerations?

What defines this AIP as "done"?