Page History

...

2. In order to provide the weights from MXNet (NNVM) to the TensorRT graph converter before the symbol is fully bound (before the memory is allocated, etc.), the arg_params and aux_params need to be provided to the symbol's simple_bind method. The weights and other values (e.g. moments learned from data by batch normalization, provided via aux_params) will be provided via the shared_buffer argument to simple_bind as follows:

Code Block
executor = sym.simple_bind(ctx=ctx, data = data_shape,

...


softmax_label=sm_shape, grad_req='null', shared_buffer=all_params, force_rebind=True)

3. To collect arg_params and aux_params from the dictionaries loaded by model.load(), we need to combine them into one dictionary:

Code Block
def merge_dicts(*dict_args):

...


    result = {}
    for dictionary in dict_args:
        result.update(dictionary)

...


        return result
sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, epoch)
all_params = merge_dicts(arg_params, aux_params)

This all_params dictionary can be seen in use in the simple_bind call in #2. 4. Once the symbol is bound, we need to feed the data and run the forward() method. Let's say we're using a test set data iterator called test_iter. We can run inference as follows:

Code Block
for idx, dbatch in enumerate(test_iter):

...


    data = dbatch.data[0]

...


    executor.arg_dict["data"][:] =

...

 data
    executor.forward(is_train=False)

...


    preds = executor.outputs[0].asnumpy()

...

 
    top1 = np.argmax(preds, axis=1)

5. Note: One can choose between running inference with and without TensorRT. This can be selected by changing the state of the MXNET_USE_TENSORRT environment variable. Let's first write a convenience function to change the state of this environment variable:

Code Block
def set_use_tensorrt(status = False): os.environ["MXNET_USE_TENSORRT"] = str(int(status))

Now, assuming that the logic to bind a symbol and run inference in batches of batch_size on dataset dataset is wrapped in the run_inference function, we can do the following:

Code Block

print("Running inference in MXNet")
set_use_tensorrt(False)
mx_pct = run_inference(sym, arg_params, aux_params, mnist, all_test_labels, batch_size=batch_size)

print("Running inference in MXNet-TensorRT")
set_use_tensorrt(True)
trt_pct = run_inference(sym, arg_params, aux_params, mnist, all_test_labels,  batch_size=batch_size)

Simply switching the flag allows us to go back and forth between MXNet and MXNet-TensorRT inference. See the details in the unit test at ${MXNET_HOME}/tests/python/tensorrt/test_tensorrt_lenet5.py.

...

For Gluon models specifically, we need to add a data symbol to the model to load the data, as well as apply the softmax layer, because the Gluon models only present the logits that are to be presented for softmax. This is shown in python ${MXNET_HOME}/tests/python/tensorrt/test_tensorrt_resnet_resnext.py. Here's the relevant code:

Code Block
net = gluoncv.model_zoo.get_model(model_name, pretrained=True) data = mx.sym.var('data') out = net(data) softmax = mx.sym.SoftmaxOutput(out, name='softmax')

Since as in the symbolic API case, we need to provide the weights during the simple_bind call, we need to extract them. The Gluon symbol allows very easy access to the weights - we can extract them directly from the network object, and then provide them during the simple_bind call:

Code Block

net = gluoncv.model_zoo.get_model(model_name, pretrained=True)
all_params = dict([(k, v.data()) for k, v in net.collect_params().items()])
executor = softmax.simple_bind(ctx=ctx, data=(batch_size, 3, 32, 32), softmax_label=(batch_size,), grad_req='null',
                                   shared_buffer=all_params, force_rebind=True)

...

Note that for Gluon-trained models, we should use Gluon's data pipeline to replicate the behavior of the pipeline that was used for training (e.g. using the same data scaling). Here's how to get the Gluon data iterator for the CIFAR-10 examples:

Code Block
gluon.data.DataLoader( gluon.data.vision.CIFAR10(train=False).transform_first(transform_test), batch_size=batch_size, shuffle=False, num_workers=num_workers)

For more details, see the unit test examples at ${MXNET_HOME}/tests/python/tensorrt/test_tensorrt_resnet_resnext.py.

...

Page tree

Versions Compared

Old Version 9

New Version 10

Key