TensorRT provides significant acceleration of model inference on NVIDIA GPUs compared to running the full graph in MxNet using unfused GPU operators. In addition to faster fp32 inference, TensorRT optimizes fp16 inference, and is capable of int8 inference (provided the quantization steps are performed). Besides increasing throughput, TensorRT significantly reduces inference latency, especially for small batches.
This feature in MXNet now introduces runtime integration of TensorRT into MXNet, in order to accelerate inference. (#11325)
Currently, its in contrib package.

New Features - Sparse Tensor

...

Support for Gluon (experimental)

Sparse gradient support is added to nn.Embedding (#10924)
Gluon Parameter now supports "row_sparse" stype, which speeds up multi-GPU training (#11001, #11429)
Gluon HybridBlock now supports hybridization with sparse operators (#11306)

...