Codespeed / Benchmarks

Source of benchmarks: https://github.com/apache/flink-benchmarks

Jenkins running benchmarks: http://codespeed.dak8s.net:8080

Benchmarks WebUI: http://codespeed.dak8s.net:8000/changes

If you merge a performance critical change (e.g. code paths executed per record, state entry etc) to master, you should verify that it did not cause a regression comparing to the previous state of the master. There is an existing performance test suite which periodically runs on the master branch. You can check the timeline of its results after some time in this UI. If you are unsure about your changes and want to check the regression before merging, contact a PMC to submit a benchmark request and check results in the comparison UI.

Check also more details in the mailing list announcement.

If you want to know how to execute the benchmarks locally, please take a look at the benchmarks' readme.

Submitting a benchmark request

In order to submit a benchmark request you need to have an access to the jenkins ( http://codespeed.dak8s.net:8080 benchmarking infrastructure is hosted using resources outside of the Apache foundation). To get an access contact one of Apache Flink PMCs. If you already have the access you should:

Push your changes to some Flink's clone github repository branch.
Trigger a new build on Jenkins benchmark request project. You can specify the branch and which github repository should be used for this benchmark request both for the benchmarked Flink code and the benchmark code itself.
1. Please select as small set of benchmarks to execute as possible. If you select single benchmark to execute (for example org.apache.flink.benchmark.SortingBoundedInputBenchmarks ), the benchmark-request build will take ~11 minutes. Executing all benchmarks can take ~2 hours to complete.
2. Please make sure that nobody else is doing some benchmarking at the same time. Benchmark runs are being queued one after another, but comparison view in the benchmarks WebUI compares only the single latest result against the latest master, so make sure you do not overwrite someone else results as he is analysing them with that someone's else knowledge. You can check who and when was doing the last comparison by looking when was the last benchmark request run on the Jenkins and who has started it. If last activity was more than a day ago, you are probably good to go. If not, please contact the person that triggered the latest build and coordinate the efforts.
If your benchmark-request build is blocked by other builds that were started by the timer, feel free to cancel those timer builds.
1. NEVER cancel builds started by other users, at least without asking for their permission first.
2. You can check who has started the build in Jenkins. You will see one of "Started by timer" or "Started by USER_XYZ".
Once benchmark request build finishes, the results will be:
1. Archived as the build artefact on the Jenkins as CSV file. This is easier if you are looking for results of one or a couple of particular benchmarks.
2. Pushed to comparison UI. This might be a bit clogged, but could be useful if you are checking results of all of the benchmarks.

Jenkins

The general structure of projects on Jenkins is as follows.

Main benchmarks are executed in project `flink-master-benchmarks`. This job periodically executes main benchmarks and uploads them to the WebUI. `flink-statebackend-benchmark` is doing the same thing, but for the State Backend benchmarks.

Each of those two projects, have two “benchmark-request” projects (`flink-benchmark-request` and `flink-statebackend-benchmark-request` respectively). They execute the same set of benchmarks (again main and State Backend respectively), but they are not testing code from the master branch, but from a different branch (usually benchmark-request), and upload the results to the WebUI as comparison against the latest master results. Benchmark request jobs have to be triggered manually on Jenkins.

How to handle a benchmark regression

When a benchmark regression is detected, the following steps will help to deal with regressions:

Create a Jira ticket(one per group of related benchmarks). Set effects and fix versions to the current Flink version, component=Benchmarks, type=Bug, priority=Blocker.
Post the ticket in the #flink-dev-benchmarks slack channel(replying in a thread).
Verify that the regression is real and investigate the cause. Take FLINK-30623 as an example:

Inspect the timeline following the link(http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=checkpointSingleInput.UNALIGNED&extr=on&quarts=on&equid=off&env=2&revs=200) from the notification. Suspicious commit ranges can be obtained from the figure, for this example, the suspicious range is 13ef498172b...fb272D2cdebf.
Narrow down the commit range via git log. You can directly locate a specific commit based on experience or compare the benchmark results of each commit in this range, a commit would be found if this regression is real. See instructions for using benchmark-request, you can also try to benchmark locally. http://codespeed.dak8s.net:8080 benchmarking infrastructure is hosted using resources provided by Ververica(Alibaba) and maintained by PMCs and Ververica, please contact one of Apache Flink PMCs to get access. For example, two benchmark requests had been submitted to verify whether FLINK-30533 caused the regression.
Changes in flink-benchmarks may also cause a regression, don't forget to check if flink-benchmarks have changed recently.
If a regression cannot be reproduced stably which is caused by the error in results or the issues of physical machines (like FLINK-18614[9]), this means the regression is not real.

Post benchmark results under the Jira ticket, and ping the authors of the commit(or relevant developers) to investigate the regression if the regression is real. Otherwise, set the resolution of Jira ticket as "Not a bug", post the conclusion and close the ticket.
If a regression is not fixed within a week of confirming that one commit is the root cause of the regression, contact the release manager to revert it (after confirming that reverting the changes resolves the issue using benchmark-request).

Page tree

Codespeed / Benchmarks

Submitting a benchmark request

Jenkins

How to handle a benchmark regression