For the throughput, 2 sockets can provide about 2X speedup while latency will keep the constant.

Performance boost on Intel CPU with Intel MKL-DNN backend in release 1.3

The c5.18xlarge instance offers a 2-socket Intel Xeon Platinum processor with 72 vCPUs.

...

Category	Model	Latency batchsize=1 (ms, small is better)			Throughput batchsize=128 (fps, big is better)
Category	Model	w/o MKL-DNN	w/ MKL-DNN	speedup	w/o MKL-DNN	w/ MKL-DNN	speedup
CNN/classification	ResNet-50 v1	97.19	13.04	7.45	10.29	163.52	15.90
	ResNet-50 v2	98.69	13.02	7.58	9.94	154.17	15.51
	Inception v3	175.17	16.77	10.44	5.74	135.33	23.57
	Inception v4	330.93	31.40	10.54	3.04	69.60	22.87
	DenseNet	111.66	18.90	5.91	8.52	149.88	17.60
	MobileNet	38.56	4.42	8.73	24.87	512.25	20.60
	VGG16	406.50	20.07	20.25	2.91	70.84	24.31
	AlexNet	64.60	3.80	17.00	26.58	965.20	36.32
	inception-resnet v2	181.10	49.40	3.67	5.48	82.97	15.14
CNN/object detection	Faster R-CNN	1175.74	118.62	9.91	0.85	8.57	10.08
	SSD-VGG16	721.03	47.62	15.14	1.43（batchsize=224)	28.90(batchsize=224)	19.13
	SSD-MobileNet	239.40	28.33	8.45	4.07(batchsize=256)	69.97(batchsize=256)	14.18
RNN	GNMT	683.43	94.00	7.27	1.46(batchsize=64)	10.63(batchsize=64)	6.83
GAN	DCGAN	8.94	0.24	37.85	109.13	4249.36	38.94

Inference Accuracy

The c5.18xlarge instance offers a 2-socket Intel Xeon Platinum processor with 72 vCPUs.

The model is from gluon model zoo by pre-trained parameters. The top1 and top5 accuracy are verified by MKL-DNN backend.

...