Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We use Tesla V100 32GB GPUs and set batch size equal to 64 per GPU. Each machine has 8 V100 GPUs with NVLink-enabled. Machines are inter-connected with 100 Gbps Infiniband RoCEv2 network.

BytePS outperforms Horovod (carefully tuned) by 16% in this case, both with RDMA enabled.

...