Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently only a very limited number of operators (such as exp) that support second or higher order gradient calculation. For other operators, if users try to get the second order gradient of an operator, MXNet would issue an error message such as "mxnet.base.MXNetError: [23:15:34] src/pass/gradient.cc:192: Operator _backward_XXX is non-differentiable because it didn't register FGradient attribute." This is because MXNet backend does not implement the FGradient function for the backward node of the operator and therefore cannot support second (and higher) order of gradient. See the example of sin operator below. The gradient function is registered through a hidden operator _backward_sin. However, the _backward_sin operator does not register FGradient function.

Image RemovedImage Added

Higher order gradient calculation is required for many applications such as adaptive learning rate optimization [1], W-GAN network [2], network architecture search [3], and etc. Implementing higher order gradient can help unlock these applications and improve the usability and popularity of Apache MXNet framework.

...

Case I: Operators with simple FGradient function: many simple tensor arithmetic operators are implemented using simple backward functions. E.g. the sin(x) operator shown below

Image RemovedImage Added

To support higher order gradient, we will replace the “_backward_sin” operator with an explicit cos(x) operator. Because cos(x) operator will have its own FGradient function, by calling the FGradient functino recursively we can achieve many order of derivative of the sin(x) operator. A preliminary implementation as proof-of-concept is here.

Case II: Operators with complex FGradient function: in some other operators the derivative is not so straightforward or can be easily represented analytically. E.g. the softmax operator. The FGradient function is registered using a class:

Image RemovedImage Added

For this type of operators, we have two options.

...

Case III: Operators implemented using third party library such as cuDNN and MKLDNN: many neural network specific operators are using implementations directly from cuDNN and/or MKLDNN libraries if available, such as batchnorm, convolution, pooling etc. An example of convolution operator using MKLDNN library is shown below:

Image RemovedImage Added

In such case we will have to depend on the third party library to provide higher order gradient support for these operators if users install MXNet library with other third party packages. For the first phase of this project, since we are dealing with multilayer perceptron network, we do not have to implement higher order derivative for these operators. We will survey the related third party packages and incorporate their higher order gradient implementation in MXNet.

...