Page History

(authored by Kellen Sunderland)

This document described how to use the MXNet-TensorRT runtime integration to accelerate model inference.

...

TensorRT can greatly speed up inference of deep learning models. One experiment on a Titan V (V100) GPU shows that with MXNet 1.2, we can get an approximately 3x speed-up when running inference of the ResNet-50 model on the CIFAR-10 dataset in single precision (fp32). As batch sizes and image sizes go up (for CNN inference), the benefit may be less, but in general, TensorRT helps especially in cases which have:

Many bandwidth-bound or latency-bound layers (e.g. pointwise operations) that benefit from GPU kernel fusion.
Inference use cases which have tight latency requirements and where the client application can't wait for large batches to be queued up.
Embedded systems, where memory constraints are tighter than on servers.
When performing inference in reduced precision, especially for integer (e.g. int8) inference.

...

The above points ensure that we find a compromise between the flexibility of MXNet, and fast inference in TensorRT. We do this with no additional burden to the user. Users do not need to learn how TensorRT APIs work, and do not need to write their own client application or data pipeline.

How do I

...

run models with TensorRT integration?

Building MXNet together with TensorRT is somewhat complex. The recipe will hopefully be simplified in the near future, but for now, it's easiest to build a Docker container with a Ubuntu 16.04 base. This Dockerfile can be found under the ci subdirectory of the MXNet repository. You can build the container as followsYou can optionally build the container yourself as follows, but we recommend you skip this step to use the DockerHub hosted version of the MXNet TensorRT containers:

Code Block
docker build -t ci/docker/Dockerfile.build.ubuntu_gpu_tensorrt mxnet_with_tensorrt

...

Code Block
nvidia-docker run -ti --rm mxnet_with_/tensorrt bash

After starting the container, you will find yourself in the /opt/mxnet directory by default.

...

Page tree

Versions Compared

Old Version 12

New Version 13

Key

How do I

run models with TensorRT integration?