Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Although data parallel is used in MXNet, its performance is not good enough for the operators with low computation on the inference, especially for the small batchsize. This phenomena wildly widely exists in many popular models, such as googlenet, wide&deep and inception V3. For example in the wide deep model, 26 embedding OPs are executed in sequence and each one only consumes very little computing resource. So, the model level performance is sub-optimal since the long execution path in the low-parallelism operators.

...