Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This group of the performance test is gathered on AWS EC2 instances in with 1 socket.

For the throughput, 2 sockets can provide about 2X speedup while latency will keep the constant.

  • Performance boost with Intel MKL-DNN backend in release 1.3

...

    • w/ MKL-DNN, pip install mxnet-mkl==1.3.0


Category

Model

Latency batchsize=1 (ms, small is better)

Throughput batchsize=128 (fps, big is better)

no mkldnn

release 1.3 + mkldnn

speedup

no mkldnn

release 1.3 + mkldnn

speedup

CNN/classification

ResNet-50 v1

97.19

18.94

5.13

10.29

132.05

12.84

ResNet-50 v2

98.69

18.93

5.21

9.94

127.17

12.79

Inception v3

175.17

26.34

6.65

5.74

110.00

19.16

Inception v4

330.93

66.96

4.94

3.04

59.28

19.47

DenseNet

111.66

53.31

2.09

8.52

121.79

14.30

MobileNet

38.56

7.32

5.27

24.87

380.54

15.30

VGG16

406.50

40.08

10.14

2.91

69.84

23.96

AlexNet

64.60

4.33

14.90

26.58

689.86

25.96

inception-resnet v2

181.10

111.28

1.63

5.48

69.39

12.66

CNN/object detection

Faster R-CNN

1175.74

95.15

12.36

0.85

10.51

12.36

SSD-VGG16

721.03

127.48

5.66

1.43(batchsize=224)

27.35(batchsize=224)

19.13

SSD-MobileNet

 

100.75

 

 

57.73(batchsize=256)

 

RNN

GNMT

683.43

100.30

6.81

1.46(batchsize=64)

9.97(batchsize=64)

6.83

GAN

DCGAN

8.94

0.22

41.36

109.13

4059.74

37.20



  • Performance gain from operator fusion by subgraph

...