Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: docker-compose dependency

MXNets Continuous Integration system is covering a big variety of environments with the help of Docker. This ensures consistent test behaviour and reproducibility in between multiple runs. This guide explains how to make use of the available tools to recreate test results on your local machine. 

Table of Contents
outlinetrue

EC2 instances with automated setup

Set up your instance with the setup documented in MXNet Developer setup on AWS EC2

Then clone the MXNet repository and either use dev_menu.py for common usecases or continue with the instructions.

1. Requirements

In order to run this toolchain, the following packages have to be installed. Please note that CPU tests can be run on Mac OS and Ubuntu, while GPU tests may only be executed under Ubuntu. Unfortunately, Windows builds and tests are being done without Docker and are thus not covered by this guide.

  • Docker
  • docker-compose
  • Python3
  • Optional: Nvidia-Docker (Ubuntu only, for GPU tests) 
  • Optional: GPU with Cuda Compute Capability ≥ 3.0
  • Disk space: at least 100GB (150GB recommended)
  • Code and Python dependencies, which are defined in ci/requirements.txt 
Code Block
languagebash
themeRDark
pip3 install -r ci/requirements.txt

This part explains what commands to run in order to reproduce a failure at each stage.

 --user

1.1. EC2 instances with automated setup

If you plan to use EC2 to reproduce the test results, you can set up your instance with the automated setup documented in MXNet Developer setup on AWS EC2

Then clone the MXNet repository and either use dev_menu.py for common use cases or continue with the instructions.

2. Reproducing failures

2.1. Build

A build failure like shown below can be reproduced by copying the failed command, which starts with ci/build.py, and running it on your local machine while being in the root of your mxnet source directory. This step does NOT require a GPU, nor CUDA dependencies. 

...

In this case, you would like to run ci/build.py --platform ubuntu_build_cuda /work/runtime_functions.sh build_ubuntu_gpu_cuda8_cudnn5, which would produce an output like the following image:

2.2. Test

Reproducing test failures requires an additional step due to MXNet binaries not being present in your local workspace. 

...

In this case, the stash is labelled as mkldnn_gpu. The easiest way to map this to a build-step, is by opening the corresponding the Jenkinsfile and searching for    pack_lib('mkldnn_gpu'  In this case, you will find a block like the following:

Code Block
languagegroovy
themeRDark
linenumberstrue
def compile_unix_mkldnn_gpu() {
  return ['GPU: MKLDNN': {
    node(NODE_LINUX_CPU) {
      ws('workspace/build-mkldnn-gpu') {
        timeout(time: max_time, unit: 'MINUTES') {
          utils.init_git()
          utils.docker_run('ubuntu_build_cuda', 'build_ubuntu_gpu_mkldnn', false)
          utils.pack_lib('mkldnn_gpu', mx_mkldnn_lib, true)
        }
      }
    }
  }]
}

...

Please note the parameter --nvidiadocker in this example. This indicates that this test requires a GPU and is thus only executable on a Ubuntu machine with Nvidia-Docker and a GPU installed. The result of this execution should look like follows:



3. Tips and Tricks

Repeating test execution

In order to test a test for it's robustness against flakiness, you might want to repeat the execution multiple times. This can be achieved with the MXNET_TEST_COUNT environment variable. The execution would look like follows:

...

Note: additional options will be added once the flaky test detector is deployed



4. Troubleshooting

In case you run into any issues, please try the following steps:

...