Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This document provides a detailed description of the MXNet-TensorRT runtime integration feature.  This document covers advanced techniques, contains a roadmap reflecting the current state of the feature and future directions, and also contains up-to-date benchmarks.  If you'd like a quick overview of the feature with a tutorial describing a simple use-case please refer to this MXNet hosted tutorial.  For more information you may also visit the original design proposal page.

Table of Contents

Why is TensorRT integration useful?

...

Initial integration has been completed and launched as of MXNet 1.3.  We've tested this integration against a variety of models, including all the gluonCV models, Wavenet and some custom computer vision models.  Performance is roughly in line with expectations, but we're seeing a few regressions over earlier measurements that require investigation.

Continuous Integration support is enabled and running continually for all active PRs opened with MXNet.

...

https://jira.apache.org/jira/browse/MXNET-1085

Conditional Checkout and Compilation of Dependencies

TensorRT integration required us to add a number of third party code sub-repositories to the project.  This is not ideal for users who would like to checkout and build MXNet without using the TensorRT feature.  In the future we should migrate the feature to be CMake only, and checkout the project at pre-compilation time to avoid forcing all users to checkout these subrepos.  We can also model these dependencies using CMake such that they're automatically built and linked against when required, which would make building from scratch easier for those that do want to use TensorRT integration.

Make use of Cached TRT Engines

Similar to the cudnn auto-tuning feature we've received requests from users that we cache TensorRT engine compilations so that we avoid the delay of building the engine each time we start the process.

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMXNET-1152

Increased Operator (/Layer) Coverage

...

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMXNET-1086


Decouple  NNVM to ONNX from NNVM to TensorRT in MXNet

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMXNET-1252

The current nnvm_to_onnx classes are tightly coupled to TensorRT.  We could extract all of the TensorRT specific functionality and have a proper separation between nnvm_to_onnx and onnx_to_tensorrt.  When structuring nnvm_to_onnx we should make use of object hierarchy to convert to specific opsets of onnx to help us maintain compatibility with different toolsets.  We should create a base class that performs generic onnx conversions.  We should then specialized objects that inherit from the base onnx class and take care of the  differences between opsets.  We should also create unit tests on a per-op basis to make sure we're introducing regressions.


Currently supported operators:

Operator NameOperator DescriptionStatus
Convolution
Complete
BatchNorm
Complete

elemwise_add


Complete

elemwise_sub
Complete
elemwise_mul
Complete
rsqrt
Complete
Pad
Complete
mean
Complete
FullyConnected
Complete
Flatten
Complete
SoftmaxOutput
Complete
Activationrelu, tanh, sigmoidComplete


Operators to be added:


Operator NameOperator DescriptionStatus
Deconvolution OpRequired for several Computer Vision models.In Progress
elemwise_divRequired for some Wavenet implementations.In Progress


Benchmarks

TensorRT is still an experimental feature, so benchmarks are likely to improve over time.  As of Oct 11, 2018 we've measured the following improvements which have all been run with FP32 weighted networks.


Model NameRelative TensorRT SpeedupHardware

cifar_resnet20_v2

1.21x

Titan V

cifar_resnext29_16x64d

1.26x

Titan V
Resnet 181.8xTitan V
Resnet 181.54xJetson TX1
Resnet 501.76xTitan V
Resnet 1011.99xTitan V
Alexnet1.4xTitan V


https://mxnet.incubator.apache.org/tutorials/tensorrt/inference_with_trt.html

...