Unified integration with external accelerators

MXNet can integrate with many different kinds of accelerators, including TVM, MKLDNN, TensorRT, Intel nGraph and more. These accelerators in general support a limited number of operators, and thus running computation in a model usually involves in interaction between accelerator operators and MXNet operators.

These accelerators share some common requirements:

...

The partitioning and execution of these accelerators can be different. As such, we define the following interface for accelerators to customize graph partitioning and operator execution.

class SubgraphProperty {

 public:

  // the criteria of selecting the subgraph nodes.

  virtual SubgraphSelectorPtr CreateSubgraphSelector() const = 0;

  // create an nnvm node for a given subgraph. Here users can customize how to

  // execute the operators in the subgraph.

  virtual nnvm::NodePtr CreateSubgraphNode(const nnvm::Symbol &s) const = 0;

  // Create a subgraph operator for execution.

  virtual OpStatePtr CreateSubgraphOperator(const nnvm::Symbol &sym) const = 0;

  // The type of the subgraph.

  virtual std::string GetType() const = 0;

};

Step 1: graph partition
Graph partitioning is to traverse a computation graph and group operators into subgraphs based on certain rules. There already exists an TVM fuse pass in NNVM, which groups operators into subgraphs based on certain general rules (e.g., convolution followed by element-wise operations). This graph partitioner is TVM-specific. It doesn't work for other accelerators. We need more graph partitioners. For example, TensorRT and MKLDNN requires a partitioner that finds subgraphs with specific patterns (e.g., convolution followed by batchnorm, followed by activation, etc).

Regardless of the diverse partitioning requirements, we assume all graph partitioning shares the following requirements:

...

Page tree

Versions Compared

Old Version 2

New Version 3

Key

Unified integration with external accelerators

Page tree

Page History

Versions Compared

Old Version 2

New Version 3

Key

Unified integration with external accelerators