Page History

...

The partitioning and execution of these accelerators can be different. As such, we define the following interface for accelerators to customize graph partitioning and subgraph execution inside an operator execution.

class SubgraphProperty {
 public:
  // the criteria of selecting the subgraph nodes.
  virtual SubgraphSelectorPtr CreateSubgraphSelector() const = 0;
  // create an nnvm node for a given subgraph. Here users can customize how to
  // execute the operators in the subgraph.
  virtual nnvm::NodePtr CreateSubgraphNode(const nnvm::SymbolGraph &sg) const = 0;
};

Step 1: graph partition
Graph partitioning is to traverse a computation graph and group operators into subgraphs based on certain rules. There already exists an TVM fuse pass in NNVM, which groups operators into subgraphs based on certain general rules (e.g., convolution followed by element-wise operations). This graph partitioner is TVM-specific. It doesn't work for other accelerators. We need more graph partitioners. For example, TensorRT and MKLDNN requires a partitioner that finds subgraphs with specific patterns (e.g., convolution followed by batchnorm, followed by activation, etc).

Regardless of the diverse partitioning requirements, we assume all graph partitioning shares the following requirements:

...

Given these assumptions, we traverse from every node in a graph and explore their neighbor nodes with rules provided by users. The interface below defines allows users to define the selection rules and users can use it to customize node selection. Given this interface, users can determine which edges to follow to generate a subgraph. Each timeStarting from every node, the graph partitioning algorithm creates a new subgraph selector is called, it sees a new node that connects to one of the nodes in the subgraph and and tries to grow a subgraph. Two criteria need to meet before a node is selected as a starting node:

the node hasn't been selected as part of a subgraph;
Select is called on the node and returns true.

From this starting node, SelectInput and SelectOutput are called to determine if a neighbor node (that hasn't been selected by another subgraph before) should be selected as a candidate for the subgraph. The search continues when a new node is selected as a candidate, and terminates when no more nodes are found. When the process of searching for candidate nodes ends, all of the candidate nodes will passed to Filter to finalize the subgraph. The filter step gives the user the last opportunity to drop out some of the candidate nodes. The selector is stateful. It before, and determine whether to add the node in the subgraph. When traversing from a starting node, a new selector will be created. The selector can change its own state when seeing a new node.

class SubgraphSelector {
 public:
  // Select a starting node for a subgraph.
  virtual bool Select(const nnvm::Node &n) = 0;
  // The two methods below select neighbors in the incoming or outgoing links as candidate nodes of a subgraph.
  virtual bool SelectInput(const nnvm::Node &curr_node, const nnvm::Node &new_node) = 0;
  virtual bool SelectOutput(const nnvm::Node &curr_node, const nnvm::Node &new_node) = 0;
  // After collecting all candidate nodes for a subgraph, users can use this method to prune some of the nodes
  // to finalize the subgraph.
  virtual std::vector<nnvm::Node*> Filter(nnvm::Graph* g, const std::vector<nnvm::Node*>& candidates) = 0;
};

All of the accelerators will need a selector that We provide a default selector called ContainOpSelector, which extracts a subgraph with operators supported by the accelerators. As such, we provide a selector called ContainOpSelector for this purposean accelerator. This selector or its variant is commonly needed by most of the accelerators.

When a subgraph is found, the partitioning algorithm invokes SubgraphProperty::CreateSubgraphNode to create a new node for the subgraph and connects the new node back to the original graph to replace the subgraph. The subgraph passed to CreateSubgraphNode contains shape/dtype/storage information of the input nodes. This is important because accelerators, such as TVM and TensorRT, need these information to compile and select the best kernel. CreateSubgraphNode allows any customization on the subgraph and the node, which gives users many opportunities the subgraph. For example, TensorRT can optimize a computation graph based on its input shapes and data types in CreateSubgraphNode; some of the accelerators, such as TVM and MKLDNN, can perform another level of graph partitioning for fused operators.

To perform graph partitioning, we attach a graph property (a class that implement SubgraphProperty) and invoke PartitionGraph.

g.attrs["subgraph_property"] = std::make_shared<nnvm::any>(std::move(property));
g = ApplyPass(std::move(g), "PartitionGraph");

...

Step 2: subgraph operator (function call)
 Although there are two levels of graph partitioning, we only need to handle one level of subgraphs in the executor because the subgraphs in the second level are fused into operators. We can execute these subgraphs inside special operators, which is specific to the accelerator.
...

Page tree

Versions Compared

Old Version 4

New Version 5

Key