Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
outlinetrue

Introduction

This page details benchmark results comparing MXNet 1.3.0 with MKLDNN vs without MKLDNN (integration proposal). The results clearly shows that MKL-DNN boosts inference throughput between 6x to 37x, latency reduced between 2x to 41x, while accuracy is equivalent up to an epsilon of  1e-8.

Inference Performance

This group of the performance test is gathered on AWS EC2 instance C5.18xLarge with 1 socket and 1 processor.

For the throughput, 2 sockets can provide about 2X speedup while latency will keep the constant.

Performance

...

on Intel CPU with Intel MKL-DNN backend in release 1.3

    • w/o MKL-DNN, pip install mxnet==1.3.0

...

The c5.18xlarge instance offers a 2-socket Intel Xeon Platinum processor with 72 vCPUs.

$ export KMP_AFFINITY=granularity=fine,compact,1,0

$ export OMP_NUM_THREADS=18

$ numactl --physcpubind=0-17 --membind=0 python …


CategoryModelLatency batchsize=1 (ms, small is better)Throughput batchsize=128 (fps,
higher
big is better)

no mkldnn

release 1.3 + mkldnn

speedup

no mkldnn

w/o MKL-DNNw/ MKL-DNNspeedupw/o MKL-DNNw/ MKL-DNN
release 1.3 + mkldnn
speedup
CNN/classificationResNet-50 v197.19
18
13.
94
04
5
7.
13
4510.29
132
163.
05
52
12
15.
84
90
ResNet-50 v298.69
18
13.
93
02
5
7.
21
589.94
127
154.17
12
15.
79
51
Inception v3175.17
26
16.
34
77
6
10.
65
445.74
110
135.
00
33
19
23.
16
57
Inception v4330.93
66
31.
96
40
4
10.
94
543.04
59
69.
28
60
19
22.
47
87
DenseNet111.66
53
18.
31
90
2
5.
09
918.52
121
149.
79
88
14
17.
30
60
MobileNet38.56
7
4.
32
42
5
8.
27
7324.87
380
512.
54
25
15
20.
30
60
VGG16406.50
40
20.
08
07
10
20.
14
252.91
69
70.84
23
24.
96
31
AlexNet64.60
4
3.
33
80
14
17.
90
0026.58
689
965.
86
20
25
36.
96
32
inception-resnet v2181.10
111
49.
28
40
1
3.
63
675.48
69
82.
39
97
12
15.
66
14
CNN/object detectionFaster R-CNN1175.74
95
118.
15
62
12
9.
36
910.85
10
8.
51
57
12
10.
36
08
SSD-VGG16721.03
127
47.
48
62
5
15.
66
141.43(batchsize=224)
27
28.
35
90(batchsize=224)19.13
SSD-MobileNet
 239
239.40
100
28.
75
33
 2
8.
39
45
 4
4.07(batchsize=256)
57
69.
73
97(batchsize=256)14.
18 
18
RNNGNMT683.43
100
94.
30
00
6
7.
81
271.46(batchsize=64)
9
10.
97
63(batchsize=64)6.83
GANDCGAN8.940.
22
24
41
37.
36
85109.13

4059.74

37.20
4249.3638.94

Performance AMD CPU with Intel MKL-DNN backend in release 1.3

The m5a.24xlarge offers 96 vCPUs using the AMD EPYC processors (AVX2)


CategoryModelThroughput batchsize=32 (fps, bigger is better)
w/o MKL-DNNw/ MKL-DNNspeedup
CNN/classificationResNet-50 v12.4438.57x15.8
MobileNet5.03194.7x38.7

Inference Accuracy

The c5.18xlarge instance offers a 2-socket Intel Xeon Platinum processor with 72 vCPUs.

The model is from gluon model zoo by pre-trained parameters. The top1 and top5 accuracy are verified by MKL-DNN backend. 

As below table shown, the accuracy from MXNet 1.3 without and with MKL-DNN got the exact same results with 10e-8.

Note: The dataset used ImageNet1k valdata/ are generated by imagenet1k-val.sh

# Parameterstop5 top1 top5
Inference Accuracy Comparison
AliasNetworkGPU (with cuDNN) BackendCPU (without MKL-DNN)CPU (with MKL-DNN) BackendDelta
 top1 top5 top1 top5top1top5
alexnetAlexNet61,100,8400.563125563125000.7899219789921880.563125563125000.789921880.563125000000000.789921900000000
densenet121DenseNet-1218,062,5040.742031250.9192969919296880.742031250.919296880.7420313000000000.919296900000000
densenet161DenseNet-16128,900,9360.771953130.9339063933906250.771953130.933906250.7719531000000000.933906300000000
densenet169DenseNet-16914,307,8800.757109380.9282813928281250.757109380.928281250.7571094000000000.928281300000000
densenet201DenseNet-20120,242,9840.7690625769062500.9309375930937500.7690625769062500.9309375930937500.7690625000000000.930937500000000
inceptionv3Inception V3 299x29923,869,0000.776093750.9366406936640630.776093750.936640630.7760938000000000.936640600000000
mobilenet0.25MobileNet 0.25475,5440.510390630.756875756875000.510390630.756875756875000.5103906000000000.75687500000000
mobilenet0.5MobileNet 0.51,342,5360.618515630.8378906837890630.618515630.837890630.6185156000000000.837890600000000
mobilenet0.75MobileNet 0.752,601,9760.665468750.8707031870703130.665468750.870703130.6654688000000000.870703100000000
mobilenet1.0MobileNet 1.04,253,8640.7009375700937500.8910938891093750.7009375700937500.891093750.7009375000000000.891093800000000
mobilenetv2_1.0MobileNetV2 1.03,539,1360.699765630.8928125892812500.699765630.8928125892812500.6997656000000000.892812500000000
mobilenetv2_0.75MobileNetV2 0.752,653,8640.682109380.8800781880078130.682109380.880078130.6821094000000000.880078100000000
mobilenetv2_0.5MobileNetV2 0.51,983,1040.644531250.8492969849296880.644531250.849296880.6445313000000000.849296900000000
mobilenetv2_0.25MobileNetV2 0.251,526,8560.508906250.7454688745468750.508906250.745468750.5089063000000000.745468800000000
resnet18_v1ResNet-18 V111,699,1120.708125708125000.8945313894531250.708125708125000.894531250.708125000000000.894531300000000
resnet34_v1ResNet-34 V121,814,6960.739609380.9160938916093750.739609380.916093750.7396094000000000.916093800000000
resnet50_v1ResNet-50 V125,629,0320.760625760625000.9304688930468750.760625760625000.930468750.760625000000000.930468800000000
resnet101_v1ResNet-101 V144,695,1440.779375779375000.9361719936171880.779375779375000.936171880.779375000000000.936171900000000
resnet152_v1ResNet-152 V160,404,0720.783203130.9386719938671880.783203130.938671880.7832031000000000.938671900000000
resnet18_v2ResNet-18 V211,695,7960.710468750.8967188896718750.710468750.896718750.7104688000000000.896718800000000
resnet34_v2ResNet-34 V221,811,3800.740859380.9157813915781250.740859380.915781250.7408594000000000.915781300000000
resnet50_v2ResNet-50 V225,595,0600.7675767500000.931875931875000.7675767500000.931875931875000.7675000000000.93187500000000
resnet101_v2ResNet-101 V244,639,4120.78125781250000.9401563940156250.78125781250000.940156250.78125000000000.940156300000000
resnet152_v2ResNet-152 V260,329,1400.785546880.9414063941406250.785546880.941406250.7855469000000000.941406300000000
squeezenet1.0SqueezeNet 1.01,248,4240.572734380.7955469795546880.572734380.795546880.5727344000000000.795546900000000
squeezenet1.1SqueezeNet 1.11,235,4960.570234380.7960156796015630.570234380.796015630.5702344000000000.796015600000000
vgg11VGG-11132,863,3360.670625670625000.8753125875312500.670625670625000.8753125875312500.670625000000000.875312500000000
vgg13VGG-13133,047,8480.681328130.8798438879843750.681328130.879843750.6813281000000000.879843800000000
vgg16VGG-16138,357,5440.720625720625000.9058594905859380.720625720625000.905859380.720625000000000.905859400000000
vgg19VGG-19143,667,2400.7346875734687500.91910000000.7346875734687500.91910000000.7346875000000000.9100000000
vgg11_bnVGG-11 with batch normalization132,874,3440.689531250.8888281888828130.689531250.888828130.6895313000000000.888828100000000
vgg13_bnVGG-13 with batch normalization133,059,6240.698359380.8895313889531250.698359380.889531250.6983594000000000.889531300000000
vgg16_bnVGG-16 with batch normalization138,374,4400.722265630.9039063903906250.722265630.903906250.7222656000000000.903906300000000
vgg19_bnVGG-19 with batch normalization143,689,2560.729921880.9099219909921880.729921880.909921880.7299219000000000.9099219.00000000


CMD for Reproducing Result

Please access the script and model from the link below.

https://drive.google.com/open?id=17JenLnZKsmPoZIIyktINFfMjZtDY2Ehc 

(Note: select the parent folder and click download in the drop-down menu)

You can refer to launch_benchmark_aws.sh for reproducing.