Page History

...

Integration with these accelerators should happen in the granularity of subgraphs instead of in the granularity of operators. To fuse operators, it's obvious that we need to divide a graph into subgraphs so that the operators in a subgraph can be fused into a single operator. To handle customized data formats, we should partition a computation graph into subgraphs as well. Each subgraph contains only TVM, MKLDNN or ngraph operators. In this way, MXNet converts data formats only when entering such a subgraph and the operators inside a subgraph handle format conversion themselves if necessary. This makes interaction of TVM and MKLDNN with MXNet much easier. Neither the MXNet executor nor the MXNet operators need to deal with customized data formats.

Integration with these accelerators may result in two levels of graph partitioning. In the first level, a subgraph only contains the operators supported by the accelerator. In the second level, a subgraph only contains the operators that can be fused.

TVM requires two levels of partitioning because TVM searches for global scheduling among fused operators in order to achieve the best performance.MKLDNN requires and MKLDNN require two levels of partitioning. We want to isolate MKLDNN/TVM operators from MXNet operators so that we know where to insert MKLDNN/TVM format conversion operators. MKLDNN They also wants to fuse fuses operators to achieve the optimal performance and operator fusion is done in NNVM.
TensorRT also only supports a small set of operators and performs and nGraph only need one level of partitioning. Once we get a TensorRT and nGraph subgraph, the libraries perform a graph transformation internally to fuse operators both vertically and horizontally for better performance.
nGraph probably has the same requirement as MKLDNN.
, which is hidden from the library users.

Even though invoking these libraries from MXNet requires similar steps, the partitioning rule and the subgraph The partitioning and execution of these accelerators can be different. As such, we define the following interface for accelerators to customize graph partitioning and subgraph execution inside an operator.

...

Step 1: graph partition
Graph partitioning is to traverse a computation graph and group operators into subgraphs based on certain rules. There already exists an TVM fuse pass in NNVM, which groups operators into subgraphs based on certain general rules (e.g., convolution followed by element-wise operations). This graph partitioner is TVM-specific. It doesn't work for other accelerators. We need more graph partitioners. For example, TensorRT and MKLDNN operator fusion requires a partitioner that finds subgraphs with specific patterns (e.g., convolution followed by batchnorm, followed by activation, etc).

Regardless of the diverse partitioning requirements, we assume all graph partitioning shares the following requirements:

all nodes in a subgraph should be connected via either incoming links or outgoing links or both.
a node can't belong to two or more subgraphs.

Given these assumptions, we traverse our graph partitioning algorithm traverses from every node in a graph and explore their neighbor nodes with rules provided by users. The interface below allows users to define the selection rules. Given this interface, users can determine which edges to follow to generate a subgraph. Starting from every node, the graph partitioning algorithm creates a new subgraph selector and tries to grow a subgraph. Two criteria need to meet before a node is selected as a starting node:

...

When a subgraph is found, the partitioning algorithm invokes SubgraphProperty::CreateSubgraphNode to create a new node for the subgraph and connects the new node back to the original graph to replace the subgraph. The subgraph passed to CreateSubgraphNode contains shape/dtype/storage information of the input nodes. This is important because accelerators, such as TVM and TensorRT, need these information to compile and select the best kernel. CreateSubgraphNode allows any customization on the subgraph and the node, which gives users many opportunities to optimize the subgraph. For example, TensorRT can optimize a computation graph based on its input shapes and data types in CreateSubgraphNode; some of the accelerators, such as TVM and MKLDNN, can perform another level of graph partitioning for fused operatorsoperator fusion.

To perform graph partitioning, we attach a graph property (a class that implement implements SubgraphProperty) and invoke PartitionGraph. This should be done in bind for Symbol executor and in CachedOp for Gluon hybridize.

g.attrs["subgraph_property"] = std::make_shared<nnvm::any>(std::move(property));
g = ApplyPass(std::move(g), "PartitionGraph");

...

Page tree

Versions Compared

Old Version 6

New Version 7

Key