Page History

...

Usability - Users do not have to experiment with number of workers and number of servers to get best performance out-of-the-box.
Performance - Horovod + Tensorflow has shown 2x performance of Distributed Tensorflow [1], so we expect it to compare well to parameter server.
Cost savings - Parameter servers are not needed when they use Horovod.
Simplified architecture - Leverage battle-tested libraries such as MPI and NCCL.
Profiler - Horovod has an excellent profiler for finding bottlenecks.
Online learning - Due to its MPI paradigm, Horovod can save checkpoints which enables online learning and fine-tuning of your model. With parameter server, it takes some additional work to save Optimizer state located on servers, but with Horovod, this feature comes for free.
Community - Horovod is a way for MXNet to leverage the Deep Learning community for advancements in distributed training, and for increasing MXNet's visibility.

...

We will need to modify "setup.py" to install Horovod by linking to MXNet shared library "libmxnet.so"
Add "mxnet_imagenet_resnet50.py" and "mxnet_mnist.py" scripts to "horovod/examples". These scripts must expose the Optimizer object.
We need to add a folder called "horovod/mxnet" parallel to "horovod/pytorch" and "horovod/tensorflow" that will:

- wrap the NDArray objects
- wrap the mxnet.Optimizer object

...

void MXWaitforHorovodAllreduce( NDArray* input, NDArray* output, bool average, char* name, void (*func)(NDArray*, NDArray*, bool, char*, void (*cb)()))
void MXWaitforHorovodBroadcast( NDArray* input, NDArray* output, bool average, char* name, void (*func)(NDArray*, NDArray*, bool, char*, void (*cb)()))

...

input tells MXNet the NDArray that must be locked, as well as a parameter that must be passed back to Horovod
output tells MXNet the NDArray that must be locked, as well as a parameter that must be passed back to HorovodHorovod
average tells MXNet a parameter that must be passed back to Horovod
name tells MXNet a parameter that must be passed back to Horovod
func tells MXNet the function that is called in mxnet::Engine::PushAsync() must be passed back to Horovod

...

For a better, longterm solution, it may be necessary to introduce a mechanism that uses CUDA_VISIBLE_DEVICES macro to only make given GPUs visible to each process.

Linking to MXNet shared library

Since we are linking with the MXNet shared library, we need to include the correct headers in the PyPi package. In order to avoid ABI compatibility issues, we may need to add additional APIs (e.g. mx.config.get_compile_flags or mx.config.get_link_flags) that return the compilation and linker flags respectively. Then, the Horovod installation can proceed using the exact same flags.

...

Page tree

Versions Compared

Old Version 1

New Version 2

Key

Linking to MXNet shared library