Optimize dynamic neural network models with control flow operators

Many neural network models in the fields, such as natural language processing and graph analysis, are dynamic. The dynamics in neural network models are usually expressed with flow control. There are two potential solutions for handling dynamic models: (a) TensorFlow provides symbolic control flow operators to support dynamics in the model; (b) frameworks with the imperative programming interface (e.g., Gluon, Python, TensorFlow Eager) can simply use Python control flow operations. Both solutions have their advantages and disadvantages:

Solution (a): its advantage is that it's easy to have an efficient implementation and easy to deploy the model implemented with control flow operators. However, it's very difficult to implement and debug a model by using this interface.
Solution (b): it's very intuitive to implement this model. However, it's difficult to achieve efficiency and difficult to export the model for deployment. When implementing a dynamic model with the imperative interface, the frameworks need to construct a dynamic computation graph on the fly (dynamic graph construction can cause significant overhead and it's difficult to optimize). The other problem is that a dynamic computation graph doesn't represent the neural network model, and thus can be exported for deployment.

Despite of its advantages, the current trend is that the second solution is more widely adopted, thanks to its intuitive programming interface.

Gluon tries to close the gap between the imperative programming interface and the symbolic interface. It allows machine learning scientists to design and implement their models as if using the imperative interface. It can simply turn their implementations into symbolic implementations by simply invoking hybridize(). Hybridization can achieve performance close to the real symbolic interface and make it easy to export models for deployment. However, the current implementation of Gluon has some limitations. For example, hybridization in Gluon doesn't support implementations with control flow.

This proposal is to address this limitation in Gluon. Overall, the goal of the project is to turn a dynamic graph that contains the computation in a neural network into a static graph (where the dynamics are expressed by control flow operators). In this way, we can easily hybridize dynamic models in Gluon and export them for deployment. To achieve this goal, the first step is to add symbolic control flow operators to MXNet. As such, the project has some additional benefits:

Adding flow control operators makes the backend of MXNet (i.e., NNVM) as expressive as most of programming languages, which enables us to get rid of Python for inference and deploy MXNet in virtually any environment. This is important for MXNet when being deployed for inference.
To improve the speed for both training and inference for dynamic models. Simply moving loops to the backend itself saves the Python execution overhead. More importantly, adding flow control operators is an important step for many optimizations. For example, we can handle variable-length sequences efficiently by avoiding any unnecessary computation (this can be more efficient than bucketing, which can't be used by Gluon). We can also integrate with TVM to accelerate dynamic models. With flow control operators in computation graphs, we can represent a dynamic neural network as a static graph. Flow control operators also help NNVM understand the computation and data flow in a model, which gives TVM more useful information to optimize the speed.

The main tasks in this proposal are:

To add imperative and symbolic control flow operators to MXNet. Although Python control flow is more intuitive to use, we need to construct imperative control flow operators so we can easily switch between imperative and symbolic implementations in Gluon. Currently, I'll add four control flow operators: ``ifelse’’, ``case’’, ``foreach’’ and ``while_loop’’. The detailed description of these operators have been attached at the end of this proposal.
To add support for hybridizing Gluon RNN models and variable-length sequences. These are some common applications where control flow operators are required. This task is to rewrite existing Gluon RNN implementation using control flow operators.
We can go a step further by adding support for building static graphs directly from any Gluon models with Python control flow. As such, we can export any models implemented with Gluon. This task will be exploratory. It might be difficult to handle all Python code.
To use TVM to compile and optimize dynamic models. Flow control operators understand the computation flow. It can reorganize subgraphs (e.g., the ones in the loop) and pass them to TVM for compilation to achieve better performance.

====================================== Attachment ================================================

The proposed APIs of the flow control operators are listed below:

``ifelse’’

``ifelse’’ invokes different computations based on a certain condition.

ifelse(cond, if_func, else_func, inputs)

Input arguments:

``cond’’ is a user-defined function that returns a boolean scalar symbol to define which function to compute.
``if_func’’ and ``else_func’’ are also user-defined functions whose signature is defined below.
``inputs’’ is a list of symbols that represent NDArrays.

Return value:

A list of symbols returned from one of the user-defined functions.

The signature of ``cond’’ is

def func(inputs): boolean symbol

where ``inputs’’ is the same as ``inputs’’ of ``ifelse’’ and the return value is a boolean scalar symbol.

The signature of ``if_func’’ and ``else_func’’ is

def func(inputs): outputs

where ``inputs’’ is the same as ``inputs’’ of ``ifelse’’ and ``outputs’’ is a list of symbols that represent NDArrays.

``if_func’’ and ``else_func’’ should return the same number of outputs with the same types and shapes. This is compatible with ``cond'' in TensorFlow, except the restriction in shapes.

``case’’

``case’’ is a general version of ``ifelse’’. It can also be viewed as a syntax sugar to simplify conditions with multiple branches. We can use similar arguments as the one in TensorFlow.

case(pred_fn_pairs, default, exclusive)

Input arguments:

pred_fn_pairs: {pred1: fun1, ... }
default: the default function if none of the predicates are true.
exclusive: a boolean Python scalar that indicates whether all predicates are evaluated.

Return:

The result from one of the user-defined functions in pred_fn_pairs.

``pred1’’, ``pred2’’, ... have the same signature as ``cond’’ in ``ifelse’’ and ``fun1’’, ``fun2’’, ... ``default’’ have the same signature as ``if_func’’ and ``else_func’’ in ``ifelse’’.

``case’’ has the same restriction in the input functions as ``ifelse’’, i.e., they should return the same number of outputs with the same types and shapes.

Foreach

``foreach’’ is a special form of loops. It's designed to have easy shape inference and other optimizations. It iterates over the first dimension of the input NDArray, so the number of iterations is determined before entering the loop.

foreach(func, input, state, back_prop)

Input arguments:

``input'' is a symbol that represents an NDArray.
``func’’ is a user-defined function that defines computation for each iteration.
``state’’ is a list of NDArray passed to ``func’’ as part of the inputs for the first iteration.
``back_prop’’ indicates whether there is a backward computation for the loop. When backward is required, ``foreach’’ needs to keep the outputs of each iteration.

Return values:

A tuple of NDArray (out_data, state), where ``out_data’’ is an NDArray that is a concatenation of all outputs from ``func’’ and ``state’’ is the output state in the last iteration.

The signature of ``func’’ is

def func(input, state): output, new_state

``input'' is a symbol that will contain an element from the input array of ``foreach’’; ``state'' is a list of symbols that represent data from the previous iteration; ``output'' is a symbol that contains the output data generated by this iteration; ``new_state'' is a list of symbols that contain data passed to the next iteration. All outputs from this function are concatenated into a single NDArray as the output of ``foreach’’. As such, the shape and type of the output from each iteration should always be the same. ``func'' is invoked once to generate a symbol that represents the computation in the function.

``foreach’’ is similar to ``scan’’ in TensorFlow. The only difference is that ``func’’ in ``foreach’’ has two types of outputs: one is concatenated as the output of ``foreach’’; the other outputs are passed to ``func’’ as input for the next iteration. In contrast, ``scan’’ concatenates the outputs of ``func’’ as the output of ``scan’’ and also passes them to ``func’’ as one of the inputs for the next iteration. This difference makes the implementation of LSTM with ``foreach’’ simpler and more efficient than ``scan’’ in TensorFlow.

``while_loop’’

``while_loop’’ is the general form of a loop: at the beginning of each iteration, it checks a condition function to determine the termination of the loop. As such, it is difficult to determine the number of iterations in advance and is more difficult to optimize ``while_loop’’ than ``foreach’’.

while_loop(cond, func, loop_vars, back_prop, max_iterations)

Input arguments:

``cond’’ is a user-defined function that takes ``loop_vars’’ as input and return a boolean scalar symbol to determine the termination of the loop.
``func’’ is a user-defined function that takes ``loop_vars’’ as input and performs computation of an iteration. There are two potential signatures as described below.
``loop_vars’’ is a list of symbols that represent NDArrays.
``back_prop’’ indicates whether there is a backward computation for the loop. When backward is required, ``while_loop’’ needs to keep the outputs of each iteration.
``max_iterations’’ is a python scalar or a MXNet scalar symbol that defines the maximal number of iterations. When ``max_iterations’’ is a python scalar, the maximal number of iterations is defined statically (when the computation is constructed); when ``max_iterations’’ is a MXNet scalar symbol, the maximal number of iterations is defined at runtime (when the computation graph is executed).

Return value:

Depending on the signature of ``func’’, there are two potential ways of returning values.

The signature of ``cond’’:

def cond(loop_vars): boolean scalar

There are two options for the signature of ``func’’:
Option 1:

def func(loop_vars): new_loop_vars

In this option, we only require ``loop_vars'' to have the same type for each iteration and their shape can change. ``while_loop’’ returns the return values of the last invocation of ``func’’. This interface is similar to the one in TensorFlow and is very flexible.

Option 2:

def func(loop_vars): (output, new_loop_vars)

In this option, ``output'' from each iteration will be concatenated and returned as the output of ``while_loop’’. We can require ``output'' to have the same shape and data type. We probably require arrays in ``new_loop_vars'' to have the same shape and data type as ``loop_vars''. This interface is similar to the definition of ``loop'' in ONNX and is more restrictive.

For both options, it is difficult to inference the shape of the output of ``while_loop’’ statically because we cannot determine the number of iterations required by ``while_loop’’ in advance. For the second option, even though we can't infer the shape, we may still be able to determine the maximal memory size used by the output of the loop. As such, the second option might be preferred even though it's more restrictive.

Page tree

Optimize dynamic neural network models with control flow operators

``ifelse’’

``case’’

Foreach

``while_loop’’