Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This method is different from setting environment MXNET_CPU_WORKER_NTHREADS. Using our method, we just do parallelism for special OP, while MXNET_CPU_WORKER_NTHREADS is for all OPs.

Addition of New APIs

No new APIs were added or modified.

Backward compatibility

We add a pass for backend, which have no backward compatibility issue when deactivate. When inactive, we may consider the compatibility for different passes

Performance

In wide and deep model, we replace 26 embedding Ops with one parallel_op as Fig. 1. When we do inference on SKX-8180 1 socket with batch size 1 and OMP thread 28, performance as Table 1 shows. Parallel OP has a 3.7X speedup .

OP

Time cost(ms)

embedding

51240.051

SgParallel_op

13763.959

Table1 performance for Embedding and SgParallel_OP

MKLDNN OPs will be supported since version1.0 that will make Intel MKL-DNN primitives stateless and thread safe: the same primitive can be executed in multiple independent threads as long as different threads use different scratchpads.  So we can accelerate more models such inception and googlenet.

Test Plan

Tests need to cover 2 parts. First one is the graph conversion test. We need to ensure that:

Step

Criterion 

1

All OPs are partitioned into one or more subgraphs according to executing mode.

2

Desired patterns can be captured and desired paralleled OPs will be created.

Another one is the unit test for OPs in parallel OP whitelist. All these OPs should be thread-safe. The test should cover all supported OPs and make sure they can provide the accurate result.

Milestones

  1. Support structure as Fig.1.
  2. Support structure as Fig.4. In this Fig, all OPs to be replaced has output to OP
    Image Added

    Figure 4. Replace Ops come to one OP

3. Support structure as Fig.5. In this Fig, all OPs to be replaced has input from OP X.

Image Added

Figure 5. Replace Ops come from one OP

4. Support all MKL-DNN OP, all OPs which support parallel add to whitelist.

References

https://cwiki.apache.org/confluence/display/MXNET/Parallel+Inference+in+MXNet

https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN

https://github.com/intel/mkl-dnn/tree/rfc-api-changes-v1.0/doc/rfc/api-v1.0