Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This project is currently aimed towards unit tests written in python, since these currently comprise the majority of out tests on CI. While this tool could theoretically be used to test integration tests, the time required to run these tests would make such coverage impractical. The main goal of this project is to try to catch with low effort and high impact.cd in

Design

The flaky test bot is composed of three components: Diff Collator, Test Selector, and Flakiness Checker. The motivation behind the separation of these components was twofold: . These stages each serve as a part of the flaky test detector, but have been decoupled from each other. Each stage exposes human-understandable input and output to facilitate debugging and extensibility. For example, the the flakiness checker may be useful as a standalone tool for developers working on flaky test fixes or implementing new test cases.

Image Added

Diff Collator

The purpose of the diff collator is to retrieve a list of changes that have been made to the code and sanitize it before passing it to the next stage. This is done by taking the output of git diff and parsing it to obtain file names, top-level functions, and line numbers for each code change. For the purposes of the flaky test detector, only file and function names are needed for the dependency analyzer.

Usage

  • Targets- By default the diff collator uses master and HEAD as the targets for git diff, but these can be specified using the --commits (-c) option, which directly compares two commits, or the --branches (-b) option, which compares the second commit to the common ancestor.
  • Verbosity- 3 verbosity levels can be specified by repeating the --verbosity (-v) option (e.g. -vv corresponds to verbosity level 2). Verbosity 1 outputs just file names, 2 outputs files with functions, and 3 outputs files, functions and line numbers. Defaults to 2.
  • Filters- Filter files that are included in the diff. --path (-p) can be used to specify a directory to be diffed, and --filter (-f) can be used to diff files that match a particular python regular expression. See https://docs.python.org/2/library/re.html for details.

Sample output:

python tools/flaky_test_bot/diff_collator.py -vvv --filter .*dependency_analyzer\.py
tools/flaky_test_bot/dependency_analyzer.py

...

        74:74
        103:103

...

Dependency Analyzer

The test selector dependency analyzer must be able to handle several different cases when it comes to codes changes:

...

Doing this within a single file simply means parsing the file for calls to function dependencies and returning the top-level callers. Across files, this is accomplished by storing cross-file dependencies in a config file, which is written in json; when a dependency in the config file is selected, its dependents are automatically selected as well. This avoids having to parse through every test file, but it also means that the config file must be updated when cross-file dependencies are changed. Currently this is done manually since the structure of our tests rarely changes; however, an automated solution may be worth implementing.

Usage

The dependency analyzer takes as input a list of function names and associated files in the format: <file>:<function-name> and return a list, with the same format, or top-level functions with associated files that are dependent on those in the input.

Flakiness Checker

The flakiness checker tool is a script that checks a test for flakiness by running it a large number of times. Its primary purpose is to serve as a component of the automated flaky test detection system. However, it can also be used manually by developers for flaky test fixes or when modifying test files. Before submitting a new test case or a modification to an existing test, run the flakiness checker script to test for flakiness.

...

Below is a table indicating the number of runs needed to achieve a given confidence level with a given chance that a test passes. As a default, on CI we are using a value of 10,000 to check tests.

 success rate \ confidence

99%

99.9%

99.99%

99%

458

4,603

46,049

99.9%

687

6,904

69,074

99.99%

916

9,205

92,099

Usage

python flakiness_checker.py [optional_arguments] <test-specifier>specifiers>

where <test-specifier> specifiers> is a string that specifies which test to run. This space-separated list of test specifiers. These can come in two formats:

  1. <file-name>.<test-name>, as is common in the github repository (e.g. test_example.test_flaky)
  2. <directory/<file>:<test-name>, like the input to nosetests (e.g. tests/python/unittest/test_example.py:test_flaky). Note: This directory can be either relative or absolute. Additionally, if the full path is not given, the script will search whatever directory is given for the provided file.

...