Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

New Features

Automatic Mixed Precision(experimental)

Training Deep Learning networks is a very computationally intensive task. Novel model architectures tend to have increasing numbers of layers and parameters, which slow down training. Fortunately, software optimizations and new generations of training hardware make it a feasible task. However, most of the hardware and software optimization opportunities exist in exploiting lower precision (e.g. FP16) to, for example, utilize Tensor Cores available on new Volta and Turing GPUs. While training in FP16 showed great success in image classification tasks, other more complicated neural networks typically stayed in FP32 due to difficulties in applying the FP16 training guidelines. That is where AMP (Automatic Mixed Precision) comes into play. It automatically applies the guidelines of FP16 training, using FP16 precision where it provides the most benefit, while conservatively keeping in full FP32 precision operations unsafe to do in FP16. To learn more about AMP, check out this tutorial.

MKL-DNN Reduced precision inference and RNN API support

Two advanced features, fused computation and reduced-precision kernels, are introduced by MKL-DNN in the recent version. These features can significantly speed up the inference performance on CPU for a broad range of deep learning topologies. MXNet MKL-DNN backend provides optimized implementations for various operators covering a broad range of applications including image classification, object detection, and natural language processing. Refer to the MKL-DNN operator documentation for more information.

Dynamic Shape(experimental)

MXNet now supports Dynamic Shape in both imperative and symbolic mode. MXNet used to require that operators statically infer the output shapes from the input shapes. However, there exist some operators that don't meet this requirement. Examples are:

  • while_loop: its output size depends on the number of iterations in the loop.
  • boolean indexing: its output size depends on the value of the input data.
  • many operators can be extended to take a shape symbol as input and the shape symbol can determine the output shape of these operators (with this extension, the symbol interface of MXNet can fully support shape). To support dynamic shape and such operators, we have modified MXNet backend. Now MXNet supports operators with dynamic shape such as contrib.while_loop, contrib.cond, and mxnet.ndarray.contrib.boolean_mask Note: Currently dynamic shape does not work with Gluon deferred initialization.

Large Tensor Support

Currently, MXNet supports maximal tensor size of around 4 billon (2^32). This is due to uint32_t being used as the default data type for tensor size, as well as variable indexing. This limitation has created many problems when larger tensors are used in the model. A naive solution to this problem is to replace all uint32_t in the MXNet backend source code to int64_t. This solution is not viable, however, because many data structures use uint32_t as the data type for its members. Unnecessarily replacing these variables to int64_t will increase the memory consumption causing another limitation. Second, MXNet has many submodule dependencies. Updating the variable types in the MXNet repository is not enough. We also need to make sure different libraries, such as MKLDNN, MShadow etc. supports the int64_t integer data type. Third, many front end APIs assume unsigned 32-bit integer interface. Only updating the interface in C/C++ will cause all the language bindings to fail. Therefore, we need a systematic approach to enhance MXNet to support large tensors. Now you can enable large tensor support by changing the following build flag to 1: USE_INT64_TENSOR_SIZE = 1. Note this is set to 0 by default. For more details please refer to the design document.

Dependency Update

MXNet has added support for CUDA 10, CUDA 10.1, cudnn7.5, NCCL 2.4.2, and numpy 1.16.0. These updates are available through PyPI packages and build from source, refer to installation guid for more details.

Gluon Fit API(experimental)

Training a model in Gluon requires users to write the training loop. This is useful because of its imperative nature, however repeating the same code across multiple models can become tedious and repetitive with boilerplate code. The training loop can also be overwhelming to some users new to deep learning. We have introduced an Estimator and Fit API to help facilitate training loop. Note: this feature is still experimental, for more details, refer to design document.

New Operators

  • split_v2 (#13687)
  • Gradient multiplier (contrib) operator (#13632)
  • Image normalize operator - GPU support, 3D/4D inputs (#13802)
  • Image ToTensor operator - GPU support, 3D/4D inputs (#13837)
  • Add Gluon Transformer Crop (#14259)
  • GELU (#14449)
  • AdamW operator (Fixing Weight Decay Regularization in Adam) (#13728)
  • [MXNET-1382] Add the index_array operator (#14638)
  • add an operator for computing the likelihood of a Hawkes self-exciting process (#14683)
  • Add numpy linspace (#14927)

How to build MXNet

Please follow the instructions at https://mxnet.incubator.apache.org/install/index.html

...