Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A full tutorial is provided here but we'll summarize for a simple use case below.

Installation

Installing MXNet with TensorRT integration is an easy process. First ensure that you are running Ubuntu 16.04, that you have updated your video drivers, and you have installed CUDA 9.0 or 9.2. You’ll need a Pascal or newer generation NVIDIA gpu. You’ll also have to download and install TensorRT libraries instructions here. Once your these prerequisites installed and up-to-date you can install a special build of MXNet with TensorRT support enabled via PyPi and pip. Install the appropriate version by running:

...

Code Block
nvidia-docker run -ti mxnet/tensorrt bash

Model Initialization

Code Block
import mxnet as mx
from mxnet.gluon.model_zoo import vision
import time
import os

batch_shape = (1, 3, 224, 224)
resnet18 = vision.resnet18_v2(pretrained=True)
resnet18.hybridize()
resnet18.forward(mx.nd.zeros(batch_shape))
resnet18.export('resnet18_v2')
sym, arg_params, aux_params = mx.model.load_checkpoint('resnet18_v2', 0)

Baseline MXNet Network Performance

Code Block
# Create sample input
input = mx.nd.zeros(batch_shape)

# Execute with MXNet
os.environ['MXNET_USE_TENSORRT'] = '0'
executor = sym.simple_bind(ctx=mx.gpu(0), data=batch_shape, grad_req='null', force_rebind=True)
executor.copy_params_from(arg_params, aux_params)

# Warmup
print('Warming up MXNet')
for i in range(0, 10):
	y_gen = executor.forward(is_train=False, data=input)
	y_gen[0].wait_to_read()

# Timing
print('Starting MXNet timed run')
start = time.process_time()
for i in range(0, 10000):
	y_gen = executor.forward(is_train=False, data=input)
	y_gen[0].wait_to_read()
end = time.time()
print(time.process_time() - start)

TensorRT Integrated Network Performance

Code Block
# Execute with TensorRT
print('Building TensorRT engine')
os.environ['MXNET_USE_TENSORRT'] = '1'
arg_params.update(aux_params)
all_params = dict([(k, v.as_in_context(mx.gpu(0))) for k, v in arg_params.items()])
executor = mx.contrib.tensorrt.tensorrt_bind(sym, ctx=mx.gpu(0), all_params=all_params,
                                             data=batch_shape, grad_req='null', force_rebind=True)
# Warmup
print('Warming up TensorRT')
for i in range(0, 10):
	y_gen = executor.forward(is_train=False, data=input)
	y_gen[0].wait_to_read()

# Timing
print('Starting TensorRT timed run')
start = time.process_time()
for i in range(0, 10000):
	y_gen = executor.forward(is_train=False, data=input)
	y_gen[0].wait_to_read()
end = time.time()
print(time.process_time() - start)

Benchmarking

Roadmap


The output should be the same both when using an MXNet executor and when using a TensorRT executor.  The performance speedup should be roughly 1.8x depending on the hardware and libraries used.

Roadmap

Finished Items

Initial Integration


Future work

FP16 Integration

The current integration of TensorRT into MXNet supports only FP32 float values for tensors.  Allowing FP16 values would enable many further optimizations on Jetson and Volta devices.

https://jira.apache.org/jira/browse/MXNET-1084

Subgraph Integration

The new subgraph API is a natural fit for TensorRT.  To help make the codebase consistent we'd like to port the current TensorRT integration to use the new API.  The experimental integration into MXNet requires us to use contrib API calls.  Once integration has moved to use the subgraph API users will be able to use TensorRT with a consistent API.  Porting should also enable acceleration of gluon and module base models.

https://jira.apache.org/jira/browse/MXNET-1085

Increased Operator (/Layer) Coverage

The current operator coverage is fairly limited.  We'd like to enable all models that TensorRT is able to work with. 

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMXNET-1086


Currently supported operators:

Operator NameOperator DescriptionStatus
Convolution
Complete
BatchNorm
Complete

elemwise_add


Complete

elemwise_sub
Complete
elemwise_mul
Complete
rsqrt
Complete
Pad
Complete
mean
Complete
FullyConnected
Complete
Flatten
Complete
SoftmaxOutput
Complete
Activationrelu, tanh, sigmoidComplete


Operators to be added:


Operator NameOperator DescriptionStatus
Deconvolution OpRequired for several Computer Vision models.In Progress
elemwise_divRequired for some Wavenet implementations.In Progress


Benchmarks

TensorRT is still an experimental feature, so benchmarks are likely to improve over time.  As of Oct 11, 2018 we've measured the following improvements which have all been run with FP32 weighted networks.

...

https://mxnet.incubator.apache.org/tutorials/tensorrt/inference_with_trt.html

Runtime Integration with TensorRT


Content by Label
showLabelsfalse
max5
spacesMXNET
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel = "tensorrt" and type = "page" and space = "MXNET"
labelsTensorRT

...