With this feature, now MXNet models can be exported to ONNX format.(#11213 ) Currently, MXNet supports ONNX v1.2.1 . API documentation.
Checkout this example which shows how to use MXNet to ONNX exporter APIs. ONNX protobuf so that those models can be imported in other frameworks for inference.

New Features - Topology-aware AllReduce (experimental)

This features uses trees to perform the Reduce and Broadcast. It uses the idea of minimum spanning trees to do a binary tree Reduce communication pattern to improve it. This topology aware approach reduces the existing limitations for single machine communication shown by mehods like parameter server and NCCL ring reduction. It is an experimental feature (#11591 )
Paper followed for implementation: Optimal message scheduling for aggregation.

...

TensorRT provides significant acceleration of model inference on NVIDIA GPUs compared to running the full graph in MxNet using unfused GPU operators. In addition to faster fp32 inference, TensorRT optimizes fp16 inference, and is capable of int8 inference (provided the quantization steps are performed). Besides increasing throughput, TensorRT significantly reduces inference latency, especially for small batches.
This feature in MXNet now introduces runtime integration of TensorRT into MxNetMXNet, in order to accelerate inference. (#11325)
Currently, its in contrib package.

New Features - Sparse Tensor support for Gluon

...

Sparse gradient support is added to nn.Embedding (#10924)
Gluon Parameter now supports "row_sparse" stype, which speeds up multi-GPU training (#11001, #11429)
Gluon HybridBlock now supports hybridization with sparse operators (#11306)#11429, #11231, #11197, #11001, #10924 experimental feature

...