You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »


Inference Performance

This group of the performance test is gathered on AWS EC2 instance C5.18xLarge with 1 socket and 1 processor.

For the throughput, 2 sockets can provide about 2X speedup while latency will keep the constant.

  • Performance boost with Intel MKL-DNN backend in release 1.3

    • w/o MKL-DNN, pip install mxnet==1.3.0

    • w/ MKL-DNN, pip install mxnet-mkl==1.3.0

Category

Model

Latency batchsize=1 (ms, small is better)

Throughput batchsize=128 (fps, big is better)

no mkldnn

release 1.3 + mkldnn

speedup

no mkldnn

release 1.3 + mkldnn

speedup

CNN/classification

ResNet-50 v1

97.19

18.94

5.13

10.29

132.05

12.84

ResNet-50 v2

98.69

18.93

5.21

9.94

127.17

12.79

Inception v3

175.17

26.34

6.65

5.74

110.00

19.16

Inception v4

330.93

66.96

4.94

3.04

59.28

19.47

DenseNet

111.66

53.31

2.09

8.52

121.79

14.30

MobileNet

38.56

7.32

5.27

24.87

380.54

15.30

VGG16

406.50

40.08

10.14

2.91

69.84

23.96

AlexNet

64.60

4.33

14.90

26.58

689.86

25.96

inception-resnet v2

181.10

111.28

1.63

5.48

69.39

12.66

CNN/object detection

Faster R-CNN

1175.74

95.15

12.36

0.85

10.51

12.36

SSD-VGG16

721.03

127.48

5.66

1.43(batchsize=224)

27.35(batchsize=224)

19.13

SSD-MobileNet

 

100.75

 

 

57.73(batchsize=256)

 

RNN

GNMT

683.43

100.30

6.81

1.46(batchsize=64)

9.97(batchsize=64)

6.83

GAN

DCGAN

8.94

0.22

41.36

109.13

4059.74

37.20


  • Performance gain from operator fusion by subgraph

Category

Model

Latency batchsize=1 (ms, small is better)Throughput batchsize=128 (fps, big is better)
R1.3 w/ MKL-DNNmaster w/ subgraphspeedupR1.3 w/ MKL-DNNmaster w/ subgraphspeedup

CNN/classification

ResNet-50 v1





ResNet-50 v2





Inception v3





Inception v4





DenseNet





MobileNet





VGG16





AlexNet





inception-resnet v2





CNN/object detection

Faster R-CNN





SSD-VGG16





SSD-MobileNet





RNN

GNMT





GANDCGAN




Inference Accuracy

The model is from gluon model zoo by pre-trained parameters. The top1 and top5 accuracy are verified by MKL-DNN backend.

Inference Accuracy Comparison
AliasNetwork# ParametersGPU (with cuDNN) BackendCPU (without MKL-DNN)CPU (with MKL-DNN) Backend
 top1top5 top1 top5 top1 top5
alexnetAlexNet61,100,840      
densenet121DenseNet-1218,062,504      
densenet161DenseNet-16128,900,936      
densenet169DenseNet-16914,307,880      
densenet201DenseNet-20120,242,984      
inceptionv3Inception V3 299x29923,869,000      
mobilenet0.25MobileNet 0.25475,544      
mobilenet0.5MobileNet 0.51,342,536      
mobilenet0.75MobileNet 0.752,601,976      
mobilenet1.0MobileNet 1.04,253,864      
mobilenetv2_1.0MobileNetV2 1.03,539,136      
mobilenetv2_0.75MobileNetV2 0.752,653,864      
mobilenetv2_0.5MobileNetV2 0.51,983,104      
mobilenetv2_0.25MobileNetV2 0.251,526,856      
resnet18_v1ResNet-18 V111,699,112      
resnet34_v1ResNet-34 V121,814,696      
resnet50_v1ResNet-50 V125,629,032      
resnet101_v1ResNet-101 V144,695,144      
resnet152_v1ResNet-152 V160,404,072      
resnet18_v2ResNet-18 V211,695,796      
resnet34_v2ResNet-34 V221,811,380      
resnet50_v2ResNet-50 V225,595,060      
resnet101_v2ResNet-101 V244,639,412      
resnet152_v2ResNet-152 V260,329,140      
squeezenet1.0SqueezeNet 1.01,248,424      
squeezenet1.1SqueezeNet 1.11,235,496      
vgg11VGG-11132,863,336      
vgg13VGG-13133,047,848      
vgg16VGG-16138,357,544      
vgg19VGG-19143,667,240      
vgg11_bnVGG-11 with batch normalization132,874,344      
vgg13_bnVGG-13 with batch normalization133,059,624      
vgg16_bnVGG-16 with batch normalization138,374,440      
vgg19_bnVGG-19 with batch normalization143,689,256      
  • No labels