This page details benchmark results comparing MXNet 1.3.0 with MKLDNN vs without MKLDNN. The results clearly shows that MKL-DNN boosts inference throughput between 6x to 37x, latency reduced between 2x to 41x, while accuracy is exactly equivalent.

Inference Performance

This group of the performance test is gathered on AWS EC2 instance C5.18xLarge with 1 socket and 1 processor.

For the throughput, 2 sockets can provide about 2X speedup while latency will keep the constant.

Performance boost with Intel MKL-DNN backend in release 1.3

The c5.18xlarge instance offers a 2-socket Intel Xeon Platinum processor with 72 vCPUs.

$ export KMP_AFFINITY=granularity=fine,compact,1,0

$ export OMP_NUM_THREADS=18

$ numactl --physcpubind=0-17 --membind=0 python …

Category	Model	Latency batchsize=1 (ms, small is better)			Throughput batchsize=128 (fps, higher is better)
Category	Model	no mkldnn	release 1.3 + mkldnn	speedup	no mkldnn	release 1.3 + mkldnn	speedup
CNN/classification	ResNet-50 v1	97.19	18.94	5.13	10.29	132.05	12.84
	ResNet-50 v2	98.69	18.93	5.21	9.94	127.17	12.79
	Inception v3	175.17	26.34	6.65	5.74	110.00	19.16
	Inception v4	330.93	66.96	4.94	3.04	59.28	19.47
	DenseNet	111.66	53.31	2.09	8.52	121.79	14.30
	MobileNet	38.56	7.32	5.27	24.87	380.54	15.30
	VGG16	406.50	40.08	10.14	2.91	69.84	23.96
	AlexNet	64.60	4.33	14.90	26.58	689.86	25.96
	inception-resnet v2	181.10	111.28	1.63	5.48	69.39	12.66
CNN/object detection	Faster R-CNN	1175.74	95.15	12.36	0.85	10.51	12.36
	SSD-VGG16	721.03	127.48	5.66	1.43（batchsize=224)	27.35(batchsize=224)	19.13
	SSD-MobileNet	239.40	100.75	2.39	4.07 (batchsize=256)	57.73(batchsize=256)	14.18
RNN	GNMT	683.43	100.30	6.81	1.46(batchsize=64)	9.97(batchsize=64)	6.83
GAN	DCGAN	8.94	0.22	41.36	109.13	4059.74	37.20

Inference Accuracy

The model is from gluon model zoo by pre-trained parameters. The top1 and top5 accuracy are verified by MKL-DNN backend.

As below table shown, the accuracy from MXNet 1.3 without and with MKL-DNN got the exact same results with 10e-8.

Note: The dataset used ImageNet1k valdata/ are generated by https://github.com/apache/incubator-mxnet/blob/master/example/image-classification/data/imagenet1k-val.sh

Inference Accuracy Comparison
Alias	Network	# Parameters	CPU (without MKL-DNN)		CPU (with MKL-DNN Backend)
Alias	Network	# Parameters	top1	top5	top1	top5
alexnet	AlexNet	61,100,840	0.563125	0.78992188	0.563125	0.7899219
densenet121	DenseNet-121	8,062,504	0.74203125	0.91929688	0.7420313	0.9192969
densenet161	DenseNet-161	28,900,936	0.77195313	0.93390625	0.7719531	0.9339063
densenet169	DenseNet-169	14,307,880	0.75710938	0.92828125	0.7571094	0.9282813
densenet201	DenseNet-201	20,242,984	0.7690625	0.9309375	0.7690625	0.9309375
inceptionv3	Inception V3 299x299	23,869,000	0.77609375	0.93664063	0.7760938	0.9366406
mobilenet0.25	MobileNet 0.25	475,544	0.51039063	0.756875	0.5103906	0.756875
mobilenet0.5	MobileNet 0.5	1,342,536	0.61851563	0.83789063	0.6185156	0.8378906
mobilenet0.75	MobileNet 0.75	2,601,976	0.66546875	0.87070313	0.6654688	0.8707031
mobilenet1.0	MobileNet 1.0	4,253,864	0.7009375	0.89109375	0.7009375	0.8910938
mobilenetv2_1.0	MobileNetV2 1.0	3,539,136	0.69976563	0.8928125	0.6997656	0.8928125
mobilenetv2_0.75	MobileNetV2 0.75	2,653,864	0.68210938	0.88007813	0.6821094	0.8800781
mobilenetv2_0.5	MobileNetV2 0.5	1,983,104	0.64453125	0.84929688	0.6445313	0.8492969
mobilenetv2_0.25	MobileNetV2 0.25	1,526,856	0.50890625	0.74546875	0.5089063	0.7454688
resnet18_v1	ResNet-18 V1	11,699,112	0.708125	0.89453125	0.708125	0.8945313
resnet34_v1	ResNet-34 V1	21,814,696	0.73960938	0.91609375	0.7396094	0.9160938
resnet50_v1	ResNet-50 V1	25,629,032	0.760625	0.93046875	0.760625	0.9304688
resnet101_v1	ResNet-101 V1	44,695,144	0.779375	0.93617188	0.779375	0.9361719
resnet152_v1	ResNet-152 V1	60,404,072	0.78320313	0.93867188	0.7832031	0.9386719
resnet18_v2	ResNet-18 V2	11,695,796	0.71046875	0.89671875	0.7104688	0.8967188
resnet34_v2	ResNet-34 V2	21,811,380	0.74085938	0.91578125	0.7408594	0.9157813
resnet50_v2	ResNet-50 V2	25,595,060	0.7675	0.931875	0.7675	0.931875
resnet101_v2	ResNet-101 V2	44,639,412	0.78125	0.94015625	0.78125	0.9401563
resnet152_v2	ResNet-152 V2	60,329,140	0.78554688	0.94140625	0.7855469	0.9414063
squeezenet1.0	SqueezeNet 1.0	1,248,424	0.57273438	0.79554688	0.5727344	0.7955469
squeezenet1.1	SqueezeNet 1.1	1,235,496	0.57023438	0.79601563	0.5702344	0.7960156
vgg11	VGG-11	132,863,336	0.670625	0.8753125	0.670625	0.8753125
vgg13	VGG-13	133,047,848	0.68132813	0.87984375	0.6813281	0.8798438
vgg16	VGG-16	138,357,544	0.720625	0.90585938	0.720625	0.9058594
vgg19	VGG-19	143,667,240	0.7346875	0.91	0.7346875	0.91
vgg11_bn	VGG-11 with batch normalization	132,874,344	0.68953125	0.88882813	0.6895313	0.8888281
vgg13_bn	VGG-13 with batch normalization	133,059,624	0.69835938	0.88953125	0.6983594	0.8895313
vgg16_bn	VGG-16 with batch normalization	138,374,440	0.72226563	0.90390625	0.7222656	0.9039063
vgg19_bn	VGG-19 with batch normalization	143,689,256	0.72992188	0.90992188	0.7299219	0.9099219

Page tree

MXNet with Intel MKL-DNN - Performance Benchmarking (WIP)

Inference Performance

Performance boost with Intel MKL-DNN backend in release 1.3

Inference Accuracy