This project is currently aimed towards unit tests written in python, since these currently comprise the majority of out tests on CI. While this tool could theoretically be used to test integration tests, the time required to run these tests would make such coverage impractical. It also will not cover things such as test-order dependency or flakiness that is only reproducible in a specific environment. The main goal of this project is to try to provide a baseline coverage that can catch most flakiness with low effort and high impact.

Doing this within a single file simply means parsing the file for calls to function dependencies and returning the top-level callers. Across files, this is accomplished by storing cross-file dependencies in a config file, which is written in json; when a dependency in the config file is selected, its dependents are automatically selected as well. This avoids having to parse through every test file, but it also means that the config file must be updated when cross-file dependencies are changed.

Cross-file dependencies

As mentioned above, cross-file dependencies are handled using a config file. below is an example of such a file.

{
    "tests/python/gpu/test_operator_gpu.py": [
        "test_operator.py",
        "test_optimizer.py",
        "test_random.py",
        "test_exc_handling.py",
        "test_sparse_ndarray.py",
        "test_sparse_operator.py",
        "test_ndarray.py"
    ],
    "tests/python/gpu/test_gluon_gpu.py": [
        "test_gluon.py",
        "test_loss.py",
        "test_gluon_rnn.py"
    ],
    "tests/python/mkl/test_quantization_mkldnn.py": [
        "test_quantization.py"
    ],
    "tests/python/quantization_gpu/test_quantization_gpu.py": [
        "test_quantization.py"
    ]
}

Each file with other file dependencies is listed along with a list of its dependencies. In this example, test_gluon_gpu.py depends on three files: test_gluon.py, test_loss.py, and test_gluon_rnn.py. Currently this is done manually since the structure of our tests rarely changes; however, an automated solution may be worth implementing.

...

Below is a table indicating the number of runs needed to achieve a given confidence level with a given chance that a test passes. As a default, on CI we are using a value of 10,000 to check tests. This number will give us a relatively high confidence that we are catching flakiness even if it only occurs

success rate \ confidence	99%	99.9%	99.99%
99%	458	4,603	46,049
99.9%	687	6,904	69,074
99.99%	916	9,205	92,099

...

python flakiness_checker.py [optional_arguments] <test-specifiers>specifier>

where <test-specifiers> specifier> is a space-separated list of test specifiers. These string specifying which test to run. This can come in two formats:

<file-name>.<test-name>, as is common in the github repository (e.g. test_example.test_flaky)
<directory/<file>:<test-name>, like the input to nosetests (e.g. tests/python/unittest/test_example.py:test_flaky). Note: This directory can be either relative or absolute. Additionally, if the full path is not given, the script will search whatever directory is given for the provided file.

...

-s SEED, --seed SEED use SEED as the test seed, rather than a random seed

Integration

The integration plan for this project can be found here: Flaky Test Bot Integration Plan. This tool will be run as a Jenkins job that will be triggered on pull requests. The Jenkins job executes the diff collator and dependency analyzer in one step to detect modified or added test cases. If any tests are selected, then MXNet is compiled and the flakiness checker is run to check the tests for flakiness.

Page tree

Versions Compared

Old Version 3

New Version Current

Key

Cross-file dependencies

Integration

Page tree

Page History

Versions Compared

Old Version 3

New Version Current

Key

Cross-file dependencies

Integration