Page History

...

Code Block

language	py

def convert_model(sym, arg_params, aux_params, target_dtype="float16", target_dtype_ops=None,
fp32_ops=None, widest_dtype_ops=None,
conditional_fp32_ops=None, excluded_sym_names=None):
"""API for converting a model from FP32 model to a mixed precision model.
MXNet tries to convert the FP32 model to mixed precision model by adding
cast layers using amp_cast and amp_multicast operators. The decision on
which cast layer to add is based on hardcoded lists for Automatic Mixed Precision
in MXNet. These lists can be overridden by the user by providing their own lists
using : targe_precision_ops, fp32_ops, widest_precision_ops, conditional_fp32_ops

Parameters
----------
sym : str or Symbol
Defines the structure of a neural network for FP32 types.
arg_params : dict
Dictionary of name to `NDArray`.
aux_params : dict
Dictionary of name to `NDArray`.
target_dtype : str
Currently only supports float16. The target dtype indicates to add cast layers
when possible so that lower precision computation can be leveraged.
target_dtype_ops : list of strs
Override the list of operator names casted to target_dtype.
If None, uses the framework's default list to be casted to target dtype.
fp32_ops : list of strs
Override the lists of operator names casted to FP32.
If None, uses the framework's default list to be casted to FP32.
widestconditional_dtypefp32_ops : list of strs
A(string, string, list of op names provided by user which should run in widest precision among its inputs.
If None, uses the framework's default list of widest_precision_ops.
conditional_fp32_ops : list of (string, string, list of string)
Override the list of operatorsstring)
Override the list of operators to be casted to FP32.
The format of the list is
(name of the function, name of the parameter,
list of values of the parameter that make the operator to be casted to FP32.
The format of the list is
(name of the function, name of the parameter,
list of values of the parameter that make the operator to be casted to
fp32)
excluded_
fp32)
excluded_sym_names : list of strs
A list of strings that represent the names of symbols that users want to exclude
from being quantized.
"""

...

Code Block

language	py

def convert_hybrid_block(block, target_dtype="float16", target_dtype_ops=None,
                         fp32_ops=None, widest_dtype_ops=None, conditional_fp32_ops=None,
                         excluded_sym_names=None, input_names=['data']):
    """Given a hybrid block/symbol block representing a neural network of data type FP32 and target_dtype,
    return a block with mixed precision support

    Parameters
    ----------
    block : HybridBlock or SymbolBlock object
        FP32 HybridBlock or SymbolBlock object
    target_dtype : str or numpy
        currently only supports float16. The target dtype indicates to add cast layers
        when possible so that lower precision computation can be leveraged.
    target_precision_ops : list of strs
        Override the list of operator names casted to target_dtype.
        If None, uses the framework's default list to be casted to target dtype.
    fp32_ops : list of strs
        Override the lists of operator names casted to FP32.
        If None, uses the framework's default list to be casted to FP32.
    widestconditional_precisionfp32_ops : list of (string, string, list of strsstring)
        Override the list of operatorfunctions namescasted whichto shouldFP32.
 run in widest precision among its
  The format of the list  input arguments.is
        If(name None,of uses the framework's default list of widest_precision_ops.function, name of the parameter,
    conditional_fp32_ops : list of (string, string, list of string)
values of the parameter that make the operator Override the list of functionsto be casted to
 FP32.
       fp32)
 The  format of theexcluded_sym_names : list isof strs
        A (namelist of thestrings function, name ofthat represent the parameter,
names of symbols that users want to exclude
  list of values of the parameter thatfrom make the operator to be casted tobeing quantized.
        fp32)
    excluded_syminput_names : list of strs
        A list of strings that representrepresenting the names of symbols that users want to exclude
        from being quantized.
    input_names : list of strs
        A list of strings representing the names of input variables
	"""input variables
	"""

User experience will User experience will be similar to the export API experience today. Users will have to call hybridize followed by one forward pass before calling convert_model.

...

Add a NNVM pass for the backend. This would use the amp lists based on the original_dtype and target_dtype.
This pass will perform graph traversal and add amp_cast and amp_multicast layers for FP16 and FP32 ops based on the op whitelists and excluded_sym_names. Some of the ideas have been borrowed from quantization pass added as part of quantization support [2].

...

After mixed precision pass is done and amp_cast and amp_multicast layers are added, the symbolic representation needs to be modified to store the right dtype attrs for some of its inputs. This will require running InferType pass after the NNVM ReducePrecision pass and then using the obtained information to set the data types of inputsweights and auxiliary states.

This will ensure that the dtype corresponding to each param or aux input will have the right dtypeis correct, by casting the arg_params and aux_params accordingly inside .

Thus the symbol returned by convert_model .

Gluon Changes

For Gluon code, we need to add an internal API to retrieve sym, arg_params and aux_params from a hybrid_block. Following this, convert_model can be used to convert a symbol json, model params and auxiliary params. After conversion, the symbolic model (json, arg_params, aux_params) can be imported back into gluon with SymbolBlock.imports. The returned symbolblock is ready to use for inference.

Frontend Bindings

Need to add amp convert_model API support for different bindings like C++, Scala etc.

FAQ

...

API will have amp_cast and amp_multicast symbols and the "__dtype__" attribute of weight and aux symbols will be updated. Also the returned arg_params and aux_params ndarrays will have the same dtype as the "__dtype__" attribute in the returned symbol.

Gluon Changes

For Gluon code, we need to add an internal API to retrieve sym, arg_params and aux_params

...

from a hybrid_block. Following this, convert_model can be used to convert a symbol json, model params and auxiliary params. After conversion, the symbolic model (json, arg_params, aux_params) can be imported back into gluon with SymbolBlock.imports. The returned symbolblock is ready to use for inference.

Frontend Bindings

Need to add amp convert_model API support for different bindings like C++, Scala etc.

Performance

Setup

EC2 Instance: p3.8xlarge

CUDNN: 7.4.2

CUDA: 10.0

Commit Hash: b3b952f9d5490ee2707209ab866e6c3f094e2046 (PoC changes made on top of this built from source)

Mixed Precision Models:

Resnet50_v1: JSON File, Params File

imagenet1k-resnet-152: JSON File, Params File

Results

Model (Samples/sec)	Batch Size	Original Model (Samples/sec)	Mixed Precision Model (Samples/sec)	Original Model with Implicit Type Conversion (MXNET_CUDA_TENSOR_OP_MATH_ALLOW_CONVERSION=1) (Samples/sec)
imagenet1k-resnet-152	1	85	72	72
	2	140	140	142
	4	240	270	228
	8	320	470	261
	16	405	680	315
resnet50_v1	1	215	165	205
	2	370	330	365
	4	560	600	545
	8	760	980	635
	16	935	1400	790

FAQ

Will the arg_params and aux_params be casted to fp16 ?

Inputs of ops in FP16 will be casted. Other params may or may not be casted based on the type inference logic.

How is this different from casting inputs to FP16 and casting params to FP16 in Gluon ?

Casting inputs to FP16 and params to FP16 for gluon ensures that you are able to execute the model in FP16 precision. Generally, there may be some ops which may need to run in FP16 while other in FP32 for accuracy and performance considerations. This is where the AMP APIs will be useful.

Will the dtype attribute in the serialized model change after convert_model is called ?

Yes dtype attribute in the serialized model can change after convert_model is called. This depends on how the whitelist affects the model in question and if the type inference decides that certain params needs to be in float16.

Is there a need for hybridizing and running a forward pass for the AMP converted gluon model ?

No there is no need to hybridize since it will return SymbolBlocks which are already hybridized.

What changes need to be made to existing script to convert and run inference mixed precision model ?

Adding the line, amp.convert_model or amp.convert_block should be sufficient to convert and run inference on a mixed precision model. Below are two user experience examples to convert a model to mixed precision model and run inference:

Module API

Code Block

sym, arg_params, aux_params = mx.model.load_checkpoint("resnet18", 0)

# Additional line below to convert to a mixed precision model. Everything else remains the same
result_sym, arg_params, aux_params = mx.contrib.amp.convert_model(sym, arg_params, aux_params, target_dtype="float16")

mod = mx.mod.Module(result_sym, data_names=['data'], label_names=None, context=mx.cpu())
mod.bind(data_shapes=[['data', (1, 3, 224, 224)]])
mod.set_params(arg_params, aux_params)
mod.forward(mx.io.DataBatch(data=[mx.nd.ones((1, 3, 224, 224))], label=None))
result = mod.get_outputs()[0].asnumpy()

Gluon API

Code Block

net = get_model(name="resnet50_v1", classes=1000, pretrained=True)
net.hybridize()
x = mx.nd.random.uniform(0, 1, shape=(1, 3, 224, 224))
out = net(x)

# Additional line below to convert to a mixed precision model. Everything else remains the same
net = mx.contrib.amp.convert_block(net, target_dtype="float16")

out = net(x)

Depends on the whitelists provided. The default whitelists have been selected in a way to avoid casting of the params, for commonly used convnet networks. If the whitelist is such that the type inference decides that certain param needs to be float16 then it will be casted.

How is this different from casting inputs to FP16 and casting params to FP16 in Gluon ?

Casting inputs to FP16 and params to FP16 for gluon ensures that you are able to execute the model in FP16 precision. Generally, there may be some ops which may need to run in FP16 while other in FP32 for accuracy and performance considerations. This is where the AMP APIs will be useful.

Will the dtype attribute in the serialized model change after convert_model is called ?

Yes dtype attribute in the serialized model can change after convert_model is called. This depends on how the whitelist affects the model in question and if the type inference decides that certain params needs to be in float16.

Is there a need for hybridizing and running a forward pass for the AMP converted gluon model ?

No there is no need to hybridize since it will return SymbolBlocks which are already hybridized.

What changes need to be made to existing script to convert and run inference mixed precision model ?

...

Page tree

Versions Compared

Old Version 9

New Version Current

Key

Gluon Changes

Frontend Bindings

FAQ

Gluon Changes

Frontend Bindings

Performance

Setup

Results

FAQ

Will the arg_params and aux_params be casted to fp16 ?

How is this different from casting inputs to FP16 and casting params to FP16 in Gluon ?

Will the dtype attribute in the serialized model change after convert_model is called ?

Is there a need for hybridizing and running a forward pass for the AMP converted gluon model ?

What changes need to be made to existing script to convert and run inference mixed precision model ?

Module API

Gluon API

How is this different from casting inputs to FP16 and casting params to FP16 in Gluon ?

Will the dtype attribute in the serialized model change after convert_model is called ?

Is there a need for hybridizing and running a forward pass for the AMP converted gluon model ?

What changes need to be made to existing script to convert and run inference mixed precision model ?

References