Inference Performance

This group of the performance test is gathered on AWS EC2 instance C5.18xLarge with 1 socket and 1 processor.

For the throughput, 2 sockets can provide about 2X speedup while latency will keep the constant.

Category	Model	Latency batchsize=1 (ms, small is better)			Throughput batchsize=128 (fps, big is better)
Category	Model	no mkldnn	release 1.3 + mkldnn	speedup	no mkldnn	release 1.3 + mkldnn	speedup
CNN/classification	ResNet-50 v1	97.19	18.94	5.13	10.29	132.05	12.84
	ResNet-50 v2	98.69	18.93	5.21	9.94	127.17	12.79
	Inception v3	175.17	26.34	6.65	5.74	110.00	19.16
	Inception v4	330.93	66.96	4.94	3.04	59.28	19.47
	DenseNet	111.66	53.31	2.09	8.52	121.79	14.30
	MobileNet	38.56	7.32	5.27	24.87	380.54	15.30
	VGG16	406.50	40.08	10.14	2.91	69.84	23.96
	AlexNet	64.60	4.33	14.90	26.58	689.86	25.96
	inception-resnet v2	181.10	111.28	1.63	5.48	69.39	12.66
CNN/object detection	Faster R-CNN	1175.74	95.15	12.36	0.85	10.51	12.36
	SSD-VGG16	721.03	127.48	5.66	1.43（batchsize=224)	27.35(batchsize=224)	19.13
	SSD-MobileNet		100.75			57.73(batchsize=256)
RNN	GNMT	683.43	100.30	6.81	1.46(batchsize=64)	9.97(batchsize=64)	6.83
GAN	DCGAN	8.94	0.22	41.36	109.13	4059.74	37.20

- R1.3 w/ MKL-DNN, pip install mxnet-mkl==1.3.0
- master w/ subgraph, CI https://github.com/apache/incubator-mxnet/commit/213ab09e7a2924da436c0d0526d62fefeeea6aa7
  build: make USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_INTEL_PATH=/opt/intel/ -j
  runtime env: export MXNET_SUBGRAPH_BACKEND=MKLDNN

Category	Model	Latency batchsize=1 (ms, small is better)			Throughput batchsize=128 (fps, big is better)
Category	Model	R1.3 w/ MKL-DNN	master w/ subgraph	speedup	R1.3 w/ MKL-DNN	master w/ subgraph	speedup
CNN/classification	ResNet-50 v1
	ResNet-50 v2
	Inception v3
	Inception v4
	DenseNet
	MobileNet
	VGG16
	AlexNet
	inception-resnet v2
CNN/object detection	Faster R-CNN
	SSD-VGG16
	SSD-MobileNet
RNN	GNMT
GAN	DCGAN

Inference Accuracy

The model is from gluon model zoo by pre-trained parameters. The top1 and top5 accuracy are verified by MKL-DNN backend.

Inference Accuracy Comparison
Alias	Network	# Parameters	GPU (with cuDNN) Backend		CPU (without MKL-DNN)		CPU (with MKL-DNN) Backend
Alias	Network	# Parameters	top1	top5	top1	top5	top1	top5
alexnet	AlexNet	61,100,840
densenet121	DenseNet-121	8,062,504
densenet161	DenseNet-161	28,900,936
densenet169	DenseNet-169	14,307,880
densenet201	DenseNet-201	20,242,984
inceptionv3	Inception V3 299x299	23,869,000
mobilenet0.25	MobileNet 0.25	475,544
mobilenet0.5	MobileNet 0.5	1,342,536
mobilenet0.75	MobileNet 0.75	2,601,976
mobilenet1.0	MobileNet 1.0	4,253,864
mobilenetv2_1.0	MobileNetV2 1.0	3,539,136
mobilenetv2_0.75	MobileNetV2 0.75	2,653,864
mobilenetv2_0.5	MobileNetV2 0.5	1,983,104
mobilenetv2_0.25	MobileNetV2 0.25	1,526,856
resnet18_v1	ResNet-18 V1	11,699,112
resnet34_v1	ResNet-34 V1	21,814,696
resnet50_v1	ResNet-50 V1	25,629,032
resnet101_v1	ResNet-101 V1	44,695,144
resnet152_v1	ResNet-152 V1	60,404,072
resnet18_v2	ResNet-18 V2	11,695,796
resnet34_v2	ResNet-34 V2	21,811,380
resnet50_v2	ResNet-50 V2	25,595,060
resnet101_v2	ResNet-101 V2	44,639,412
resnet152_v2	ResNet-152 V2	60,329,140
squeezenet1.0	SqueezeNet 1.0	1,248,424
squeezenet1.1	SqueezeNet 1.1	1,235,496
vgg11	VGG-11	132,863,336
vgg13	VGG-13	133,047,848
vgg16	VGG-16	138,357,544
vgg19	VGG-19	143,667,240
vgg11_bn	VGG-11 with batch normalization	132,874,344
vgg13_bn	VGG-13 with batch normalization	133,059,624
vgg16_bn	VGG-16 with batch normalization	138,374,440
vgg19_bn	VGG-19 with batch normalization	143,689,256