samples=20	SKX-8180			P100		Speedup
samples/sec	GRUCell	sym.RNN	sym.RNN/GRUCell	GRUCell	sym.RNN	sym.RNN(8180)/sym.RNN(P100)	sym.RNN(8180)/GRUCell(P100)
Inference	26.67	88.9	333%	40.57	357.14	25%	219%
Training(fwd+bwd)	15.04	39.2	261%	27.62	140.85	28%	142%

Upstream

PR#10104: This Merged, this PR is for fused LSTM operator which supports multi-layer and bidirectional computation too. Code is done and ready for review. When we try to refactor code, including CuDNN implementation, with NNVM interfaces, a segfault is observed in MXNet CI environment. The error cannot be reproduced on our local server. But seems it is caused by the memory sharing mechanism between forward and backward computation. So we removed NNVM interfaces from this PR and keep both CPU path and GPU path with legacy registration method.
PR#10311: This PR is for fused GRU operator. Multi-layer and bidirectional support is also implemented for fused GRU operator. This PR's review and merging depend on the progess of #10104.
TODOs: Vanilla RNN support is still WIP.

...

Page tree