Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

$ mpirun -np 8 --hostfile ~/hosts --bind-to none --map-by slot -x NCCL_DEBUG=INFO -x NCCL_MIN_NRINGS=4 -x LD_LIBRARY_PATH -x PATH -x MXNET_USE_OPERATOR_TUNING=0 -mca pml ob1 -mca btl ^openib python3 /home/ubuntu/master/3rdparty/horovod/examples/mxnet_imagenet_resnet50.py --benchmark 1 --batch-size=256 --network resnet-v1 --num-layers=50 --num-epochs 1 --kv-store horovod None --dtype float16 --gpus 0

Note: This call is valid when using OpenMPI. To improve user experience, this training script may be wrapped into a Bash script in the future.

...

This behaviour is the same as when doing pip install horovod for TensorFlow and PyTorch support. When those libraries are not present, Horovod installation will fail.


Image Removed     Image Modified
                                       (a) design using callbacks                                                                                                                                                                (b) simplified design

Figure 2. How Horovod interacts with MXNet engine

...

Since CPU support and GPU fp16 support are listed as experimental at the moment, we do not have performance numbers for them.

Addition of New APIs

We are introducing a new MXWaitForHorovodAllreduce and MXWaitForHorovodBroadcast function to the MXNet C API. This function will takes the form of:

  • void MXWaitforHorovodAllreduce( NDArray* input, NDArray* output, bool average, char* name, void (*func)(NDArray*, NDArray*, bool, char*, void (*)(Engine*, void*)))
  • void MXWaitforHorovodBroadcast( NDArray* input, NDArray* output, bool average, char* name, void (*func)(NDArray*, NDArray*, bool, char*, void (*)(Engine* void*)))

The parameters are:

...


Test Plan


Functionality Tests

...