Page History

...

work num	server num	Per Node FPS(pic/s)	Scaling Efficiency
8	8(worker and server share node)	19.87	67.81%
8	8	27.3	93.17%
8	4	22.7	77.47%
8	2	11.11	37.90%

Command line: python tools/launch.py -n 8 -s <server_num> --launcher ssh -H hosts python example/image-classification/train_vgg16.py --kv-store dist_sync

Following is the result of MXNet multinode with mpi allreduce supported from our proof of concept (ready):

Node Num	Per Node FPS(pic/s)	Scaling Efficiency
8	27.76	94.74%

Command line: mpirun -n 8 -ppn 1 -machinefile hosts python example/image-classification/train_vgg16.py --kv-store dist_sync_mpi

MPI Allreduce's good scalability comes from the tremendous communication time decrease compared with parameter server if not allocated enough server num.

...

Page tree

Versions Compared

Old Version 3

New Version 4

Key