Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Lin Yuan Pedro Larroy

Feature Shepherd

TBDXingjian Shi

Problem

Currently only a very limited number of operators (such as exp) support second or higher order gradient calculation. For other operators, if users try to get the second order gradient of an operator, MXNet would issue an error message such as "mxnet.base.MXNetError: [23:15:34] src/pass/gradient.cc:192: Operator _backward_XXX is non-differentiable because it didn't register FGradient attribute." This is because MXNet backend does not implement the FGradient function for the backward node of the operator and therefore cannot support second (and higher) order of gradient. See the example of sin operator below. The gradient function is registered through a hidden operator _backward_sin. However, the _backward_sin operator does not register FGradient function.

...

Higher order gradient calculation is required for many applications such as adaptive learning rate optimization [1], WWGAN-GAN GP network [2], network neural architecture search [3], etc. Implementing higher order gradient can help unlock these applications and improve the usability and popularity of Apache MXNet framework.

...

[1] Forward and Reverse Gradient-Based Hyperparameter Optimization

[2] Improved Training of Wasserstein GANGANs

[3] Neural Architecture Search with Reinforcement Learning

...