Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Usability - Users do not have to experiment with number of workers and number of servers to get best performance out-of-the-box.

  2. Performance - Horovod + Tensorflow has shown 2x performance of Distributed Tensorflow [1], so we expect it to compare well to parameter server.

  3. Cost savings - Parameter servers are not needed when they use Horovod.

  4. Simplified architecture - Leverage battle-tested libraries such as MPI and NCCL.

  5. Profiler - Horovod has an excellent profiler for finding bottlenecks.

  6. Online learning - Due to its MPI paradigm, Horovod can save checkpoints which enables online learning and fine-tuning of your model. With parameter server, it takes some additional work to save Optimizer state located on servers, but with Horovod, this feature comes for free.

  7. Community - Horovod is a way for MXNet to leverage the Deep Learning community for advancements in distributed training, and for increasing MXNet's visibility.

...

  1. We will need to modify "setup.py" to install Horovod by linking to MXNet shared library "libmxnet.so"
  2. Add "mxnet_imagenet_resnet50.py" and "mxnet_mnist.py" scripts to "horovod/examples". These scripts must expose the Optimizer object.
  3. We need to add a folder called "horovod/mxnet" parallel to "horovod/pytorch" and "horovod/tensorflow" that will:


    • wrap the NDArray objects

    • wrap the mxnet.Optimizer object

...

  • void MXWaitforHorovodAllreduce( NDArray* input, NDArray* output, bool average, char* name, void (*func)(NDArray*, NDArray*, bool, char*, void (*cb)()))
  • void MXWaitforHorovodBroadcast( NDArray* input, NDArray* output, bool average, char* name, void (*func)(NDArray*, NDArray*, bool, char*, void (*cb)()))

...

  • input tells MXNet the NDArray that must be locked, as well as a parameter that must be passed back to Horovod
  • output tells MXNet the NDArray that must be locked, as well as a parameter that must be passed back to HorovodHorovod
  • average tells MXNet a parameter that must be passed back to Horovod
  • name tells MXNet a parameter that must be passed back to Horovod
  • func tells MXNet the function that is called in mxnet::Engine::PushAsync() must be passed back to Horovod

...

For a better, longterm solution, it may be necessary to introduce a mechanism that uses CUDA_VISIBLE_DEVICES macro to only make given GPUs visible to each process.

Linking to MXNet shared library

Since we are linking with the MXNet shared library, we need to include the correct headers in the PyPi package. In order to avoid ABI compatibility issues, we may need to add additional APIs (e.g. mx.config.get_compile_flags or mx.config.get_link_flags) that return the compilation and linker flags respectively. Then, the Horovod installation can proceed using the exact same flags.

...