Page History

...

The partitioning and execution of these accelerators can be different. As such, we define the following interface for accelerators to customize graph partitioning and operator execution.

class SubgraphProperty {

 public:

  // the criteria of selecting the subgraph nodes.

  virtual SubgraphSelectorPtr CreateSubgraphSelector() const = 0;

  // create an nnvm node for a given subgraph. Here users can customize how to

  // execute the operators in the subgraph.

  virtual nnvm::NodePtr CreateSubgraphNode(const nnvm::Symbol &s) const = 0;
  // Create a subgraph operator for execution.
  virtual OpStatePtr CreateSubgraphOperator(const nnvm::Symbol &sym) const = 0;
  // The type of the subgraph.
  virtual std::string GetType() const = 0;

};

Step 1: graph partition
Graph partitioning is to traverse a computation graph and group operators into subgraphs based on certain rules. There already exists an TVM fuse pass in NNVM, which groups operators into subgraphs based on certain general rules (e.g., convolution followed by element-wise operations). This graph partitioner is TVM-specific. It doesn't work for other accelerators. We need more graph partitioners. For example, TensorRT and MKLDNN requires a partitioner that finds subgraphs with specific patterns (e.g., convolution followed by batchnorm, followed by activation, etc).

Regardless of the diverse partitioning requirements, we assume all graph partitioning shares the following requirements:

...

class SubgraphSelector {
 public:
  virtual bool Select(const nnvm::Node &n) = 0;
  virtual bool UseIncomingEdges()SelectInput(const nnvm::Node &curr_node, const nnvm::Node &new_node) = 0;
  virtual bool UseOutgoingEdges()SelectOutput(const nnvm::Node &curr_node, const nnvm::Node &new_node) = 0;
};

All of the accelerators will need a selector that extracts a subgraph with operators supported by the accelerators. As such, we provide a selector called ContainOpSelector for this purpose.

To perform graph partitioning, we attach a graph property (a class that implement SubgraphProperty) and invoke PartitionGraph.

...

To customize the subgraph execution, an accelerator needs to provide their own subgraph operator in operator implementation and attach the operator to the subgraph node when SubgraphProperty::CreateSubgraphOperatorCreateSubgraphNode is called. The subgraph operator is should be a stateful operator and contains the contain a computation graph. We provide a default subgraph operator implementation (“_subgraph_op”) that executes operators with MXNet Executor.

...

For fast inference in TVM and MKLDNN, the subgraph operators need to maintain a copy of weight arrays (similar to the closure of a function). In this way, we can convert the data format of the weight arrays and cache the array inside the subgraph operator to avoid any redundant format conversion. The original weight arrays will still be part of the inputs of the subgraph operator. Even though the weight arrays are normally not modified, we still need to handle this case correctly. One solution is to maintain a version number for the var of an NDArray, which is increased by one whenever the NDArray is modified in the execution engine. We can use the version number to determine the weight arrays have been modified whenever the subgraph operator is invoked.

The benefit of invoking a subgraph inside an operator
Introducing a subgraph operator for TVM and MKLDNN may sound like unnecessary complexity. It actually significantly reduces the complexity of the integration. By using the subgraph operator, we can completely isolate TVM operators and MKLDNN operators from MXNet operators as well as the default MXNet memory planning. Inside the subgraph operators, we don't need to deal with data format conversion and can use a completely different memory plan for the subgraph.

...

Page tree

Versions Compared

Old Version 3

New Version 4

Key