Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Note: the references to nose testing tool is outdated as the community switched to pytest for testing. See development guide.

Table of Contents


This page is to provide some tips and tricks for fixing flaky tests. Flaky tests are defined as tests that fail intermittently on CI builds and they may indicate stability problems or improper handling of edge cases.

...

Always make sure you're able to reproduce the test failure using the random seed and environment info before you jump to a fix. This part is covered in Reproducing test results.

...

Some flaky tests could also be caused by other reasons, but the 3 reasons above cover nearly all the cases that the author has previously met. So attributing the root cause to those 3 reasons is the recommended first step for of finding root causes of flakiness.

...

As said above, the very first thing to do for this stage is to try your best to attribute the cause to the 3 major reasons.  The log should contain the name of the failed test, the random seed used in this test run, the calculated error using this formula: error = |expected - actual| / (rtol * |expected| + atol), the position where maximum error occurred, and the tolerance levels used for this test run, those are essential to identifying the root cause. Then you should move on to analyze the essential info from the test log. Usually if the error value is small (close to 1) and the tolerance levels are also very small, the problem is with the tolerance level settings. If the error is very high, it may indicate some problem with the implementations. On the other hand, if you're able to consistently re-produce the same error with the same seed, it's unlikely to be caused by a race condition problem. Otherwise, the root cause may be race conditions in the code. 

Let's take the log above as an example, we can see that the error is relatively small (close to 1), and the tolerance levels are quite small at the same time, which means the difference between actual and expected values is small and only exceeding the the allowed values by a very small amount. So for this one we can conclude that the root cause should be improper settings of tolerance levels, a quick fix can be done by bumping up the tolerance levels. The PR #12527 fixed the problem by bumping up the tolerance.

How to Fix Flakiness in tests

...

After you have made the necessary changes, please make sure that you re:

  • Re-compile MXNet if necessary

...

  • and verify the fix with the corresponding environment by running the same tests for more than 10000 passes (now can be done more easily with Automated Flaky Test Detector)

...

  • Submit a PR according to 

...

...

  • Have proper title for your PR and remember to refer to the tracking issue for the flaky test in your PR

...

  • Address any code review comments on your PR

...

After Fix is Merged

After you have addressed all comments on your PR, it should be good for merge, once . Once it's merged, make sure you also check if the related Github issue is closed so that we have accurate tracking. Flakiness may still exist even after a fix is delivered as 10000 trials may still not be enough to cover each and every single random seed, so . So please stay alerted alert for future occurrences.

Appendix I: Location of test code

Logs usually comes with the test name like "test_xxx.test_yyy" where the "test_xxx" part is the name of the test file, and the "test_yyy" is the name of the actual test. All test files are located at under "tests/python/" folder. Since there are some test files that also import tests from some other test files for testing in specific environments, such as test_operator_gpu.py, there's the possibility of not being able to find a certain test in a test file. Under such case, please do a search within "tests/python/" to find where the actual test code is at if you want to make certain changes for debugging purpose or for fixing the flakiness.

...