Page History

...

Although data parallel is used in MXNet, its performance is not good enough for the operators with low computation on the inference, especially for the small batchsize. This phenomena wildly widely exists in many popular models, such as googlenet, wide&deep and inception V3. For example in the wide deep model, 26 embedding OPs are executed in sequence and each one only consumes very little computing resource. So, the model level performance is sub-optimal since the long execution path in the low-parallelism operators.

...

Page tree

Versions Compared

Old Version 4

New Version 5

Key