Page History

...

Import the Model in Gluon

Code Block

language	py

# Load the model along with the transformations
net = gluon.contrib.utils.import(symbol_file="my_model-symbol.json",
                           		 param_file="my_model-0000.params",
                           		 load_transforms = True,
                           		 ctx = 'cpu')


# Prediction
pred = net(data)

Import the Model in Module API

(Supported to create a module for inference only)

Code Block

language	py

# Load the model along with the transformations. Can be used only for inference (forward())
mod = mx.contrib.Module.import(
                symbol_file = "my_model-symbol.json",
                param_file = "my_model-0000.params",
                load_transforms = True,
                ctx = 'cpu',
                batch_size = 1)

# Prediction
mod.forward(...)

Import the Model in Java Predictor API

Code Block

language	java

List<Context> context = new ArrayList<>();
context.add(Context.cpu()); 
String modelPathPrefix = "my_model";


# Load the model along with the transformations.
Predictor predictor = new Predictor(modelPathPrefix, context, load_transforms=True);

# Inference
List<NDArray> result = predictor.predictWithNDArray(inputNDArray);

Performance Consideration

During inference, initial benchmarks shows a noticeable performance gain with End to end models i.e., a model with data transformations and neural network all fused as a single model graph.

ResNet-18 model pre-trained with ImageNet. https://s3.us-east-2.amazonaws.com/mxnet-public/end_to_end_models
Pre-processing - Resize(224, 224), ToTensor, Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
We take average of 500 runs
Single Request Inference - Input Data - Synthetic (random.uniform(0, 255, shape=(1, 300, 300, 3))
Batch Inference - Input Data - Synthetic (random.uniform(0, 255, shape=(25, 300, 300, 3))
Below time gives - Average Prediction Time Per Sample

A	B	C	Non End to End Models (ms)	End to End Models (ms)	Boost %
CPU (C5.2X)	Single Request Inference	Python (Module API)	17	14	17.65%
		Java Inference APIs	17.09	14.16	17.14%
		Scala Inference APIs	17.93	13.19	26.44%

	Batch Inference (Batch size = 25)	Python (Module API)	15.18	12.57	17.19%
		Java Inference APIs	18.54	13	29.88%
		Scala Inference APIs	17	13.26	22.00%

GPU (P3.16X)	Single Request Inference	Python (Module API)	5.78	3.14	45.67%
		Java Inference APIs	8.95	4.26	52.40%
		Scala Inference APIs	9.14	4.42	51.64%

	Batch Inference (Batch size = 25)	Python (Module API)	2.61	1.31	49.81%
		Java Inference APIs	8.03	5.53	31.13%
		Scala Inference APIs	7.86	5.52	29.77%

Backward compatibility

All APIs changes are backward compatible.
Old MXNet model should still load without breakage with new MXNet version.
New MXNet model will not work on old MXNet versions. If a user tries to load new MXNet model with older MXNet version, they get errors such as - "unknown field 'inputs', 'outputs' in the model" because, MXNet's model JSON parser schema in old MXNet do not understand new fields introduced as part of this work.

Open Questions and Assumptions Made

Here we assume internal operators as no op connector node. We assume these connector nodes are not exposed to the users and hence, we can safely use as connector nodes for fusing transformations and neural network graph. Is this assumption in the right direction?
Is this ok to go to Contrib first and then update main APIs? Below are the reasoning:
1. API changes / additions in this work is proposed to go in to Contrib because "export" and "import" APIs are most widely used by every MXNet users. We propose to mature the API, address any usability or design concerns while in contrib before graduating to main APIs.
2. What is the plan to monitor and move this change from contrib to main? Once we build these extensions/new export, import APIs, we will be creating new tutorials and update the existing export/import tutorials. We will track issues and discussion for any user concerns. We will graduate these APIs after maturing in Contrib for at least 1 version of MXNet release.

Alternate Solution

Keep transformation graph and network graph separately independent of each other and fuse them at run time.
1. In the proposed approach, we fuse the transformation and neural network and export as single graph. We introduce a no_op_identifier operator to identify the link between transformation and neural network.
Another solution
2. In this alternate solution, we would be to keep the transformation and network graph separately (in same symbol file or multiple symbol file). These independent graphs can then be fused during run time.
3. Two concerns for this approach:
  1. In MXNet, a symbol and a param are foundational building block of representing 1 graph and its parameters. It may not be ideal to keep multiple graphs in the same symbol file and params file.
  2. If we decide to keep transformations and network graphs as separate files, this can be still achieved today with current MXNet APIs without any additional changes.

Page tree

Versions Compared

Old Version 6

New Version Current

Key

Import the Model in Gluon

Import the Model in Module API

Import the Model in Java Predictor API

Performance Consideration

Backward compatibility

Open Questions and Assumptions Made

Alternate Solution

References