Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add images and author name

This document was written by Lin Yuan

Introduction

The aim of this document to help developers quickly understand the overall design architecture as well as code structure of the MXNet backend. It targets readers who have programming experience in C/C++ and would like to contribute to the MXNet backend. We assume that the reader has already perused the 60min Gluon Crash Course, and followed the Handwritten Digit Recognition Tutorial to completion.

This guide is not a replacement for the existing How To Implement an Operator tutorial. Rather, it aims to provide readers a comprehensive view of the backend architecture and an understanding of how the individual components are linked together. It is organized to give a complete view at different levels of detail and is not intended to dive straight into a particular area. This guide also provides further references should readers want more details on an individual component.

...

MXNet can be many things to different people, but the underlying novelty is that it is a heterogeneous parallel processing framework with a sophisticated runtime engine that dynamically manages data dependencies, memory, data transfers, and much more. The figure below gives a high level overview of the user's interaction with MXNet. At the most basic level, users write an app that calls MXNet APIs and defines a context within which their app will be executed. The context defines the number and type of processors (currently CPUs & GPUs are supported). The app defines what computations will be performed. Since MXNet is for deep learning applications, it has APIs for computations like convolution, pooling, and gradient calculation. But it also has APIs for general numerical computation on arrays such as transpose, inner product, diagonal etc. Each computation is referred to as an “Operator”. MXNet has an extensive library of operators that are accessible through either an imperative or symbolic execution model. In either model, the execution engine always represents the set of work as a computational graph which may be optimized to combine operators to improve performance. And that is where the magic happens.


Image RemovedImage Added

Figure 1: System overview

MXNet Engine Architecture

The MXNet engine takes a computation graph and operator library and manages execution as defined by the dependencies and other constraints (like memory, data transfer, etc.). The computation graph represents the work that the user defined in their application (the parts/layers of their neural network). The engine hands out work to the set of processors defined in the context according to the dependencies in the graph. The operator implementation defines the low level implementation detail of different operators on various computing platforms.

At a high-level, there are two execution flows based on the MXNet execution model: Symbolic and Imperative. They both create a computation graph, allocate required memory, set up the context and invoke the execution engine through the computation graph. Figure 2 shows the sequence of execution in these two modes:


Image Added

Figure 2: Execution flow

...


Computation Graph

The computation graph models the dependencies between operators at a high level. Nodes can represent inputs, outputs and operators; edges represent the data dependencies between nodes. Figure 3 below shows an example of computation graph for an operation containing only a single Softmax operator. Notice that the input data is operated on by the softmax node (this is the forward pass) and results in output 1. Then this output 1 is passed to the backward_softmax operator along with ograd (which is the gradient from any downstream operators, there arent any in this example though) and results in output 3. This final output is the igrad, or input gradient, that would be passed upstream to the next operator (if there were any).
Image Removed

Image Added

Figure 3: computation graph of softmax


In the code, the Graph data structure is defined in the nnvm package from DMLC/tvm project. The goal of the Neural Network Virtual Machine (nnvm) is to take a graph representation of workloads from different frameworks and translate the high-level graphs into execution graphs.

The symbolic and imperative execution modes have different routines to create and execute the graph. In symbolic mode, the graph is created and executed using routines defined in the GraphExecutor class. In imperative mode, the routine is defined in the Imperative namespace.

Furthermore, the graph is manipulated and transformed through a series of passes defined in nnvm::pass_functions.h. The following figure shows an overview of this process.
Image Removed

Image Added

figure 4: Graph pass function call


The graph pass functions are defined in tvn/nnvm/src/pass.

...

In this section we describe the implementation of the imperative NDArray use model of MXNet in Python. As mentioned before, operators are implemented in C/C++ and registered through mechanisms defined in the dmlc-core package. DMLC-core is a common bricks library for building scalable and portable distributed machine learning. It defines a common way to register operators, their input arguments, output arguments, memory requirements, etc. so that they can all be integrated into a common library. This provides the engine a streamlined interface for executing the operators.

At runtime, when the mxnet package is imported in a user's Python program (i.e. 'import mxnet as mx') it calls some initialization routines to create Python wrappers for the C/C++ operators on the fly from the registration information described above. This has a few benefits such as minimal code overhead and reducing the number of places where changes have to be made. The _make_ndarray_function creates the python code text on the fly when the mxnet module is loaded (i.e. “import mxnet”).
Image Removed

Image Added

 
If you're not familiar with this concept, it can be quite magical initially. But when you remember that Python is a dynamic, interpreted language, you realize that its not so magical. Rather than compiling all the code ahead of time, it is parsed as the program executes. This allows code to be created as needed (when the package is imported “import mxnet” but before you call any functions “mxnet.nd.conv(a)”).

The Python wrapper is literally generated from scratch on the fly, it does not exist anywhere on disk in the MXNet sources. A Python function registers the new function in the global namespace by getting the info from the operator registration in C via the getSymbolInfo function. The code snipped below shows a string (“def %s(%s):”) being created that looks oddly like a Python function definition. Thats because it is! Based on the operator registration info, the Python wrapper for that operator is generated by appending text together.

...

The shape inference in the engine is done through multiple topological traversals of the computation graph as shown below.

Image Added

...

Figure 5: Attribute inference in computation graph

 Image Removed
In each traversal, the shape and data type of each node will be inferred from its input and/or output nodes; if the attribute of a node cannot be inferred during this pass, it's attribute will be passed on as unknown initially. Then, graph traversal repeats until all the nodes' attribute have been inferred or the number of unknown attributes cannot be further reduced, in which case an exception will be thrown. Consider the example in Figure 5. This computation graph defines a linear algebra operation:

...