Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: LSTM merged

...

samples=20SKX-8180P100Speedup
samples/secGRUCellsym.RNNsym.RNN/GRUCellGRUCellsym.RNNsym.RNN(8180)/sym.RNN(P100)sym.RNN(8180)/GRUCell(P100)
Inference26.6788.9333%40.57357.1425%219%
Training(fwd+bwd)15.0439.2261%27.62140.8528%142%

Upstream

  • PR#10104: This Merged, this PR is for fused LSTM operator which supports multi-layer and bidirectional computation too. Code is done and ready for review. When we try to refactor code, including CuDNN implementation, with NNVM interfaces, a segfault is observed in MXNet CI environment. The error cannot be reproduced on our local server. But seems it is caused by the memory sharing mechanism between forward and backward computation. So we removed NNVM interfaces from this PR and keep both CPU path and GPU path with legacy registration method. 
  • PR#10311: This PR is for fused GRU operator. Multi-layer and bidirectional support is also implemented for fused GRU operator. This PR's review and merging depend on the progess of #10104.
  • TODOs: Vanilla RNN support is still WIP.

...