Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Feature Shepherd

TBD

Problem

Currently Previously MXNet only supports supported Custom operators written in higher level langauges (ie. Python, Java/Scala, etc.) via the Custom Op interface: https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/customop.html?highlight=customop. This makes it complicated to add high performance routines written in C++ and CUDA. One solution was the MobulaOp project: https://github.com/wkcn/MobulaOP which enabled a seamless experience for loading these high performance C++ and CUDA routines built on-top of the Custom Op interface. This project was very successful and we propose to integrate the concepts and design directly into MXNet in this project. But in this project we will implement a CustomOp and dynamic library loader into the MXNet engine, enabling custom high performance ops to be leveraged from all language bindings and without the overhead of the engine using callbacks at runtime. 

...

Lastly, we want custom operators to be first-class operators and have access to all the capabilities that internal MXNet operators do. One example is enabling custom operators to leverage the MXNet resource manager for storage and memory.

...

Approach

Compiling Custom Operators

To support compiling custom operators, we need to construct a simple API/file-set that users will compile their custom operators with. The result of this compilation will be a dynamic library (Linux: *.so, Windows: *.dll). We will need to provide unit tests that allow users to test their operator registration outside of MXNet to ease debugging.

Just how operators are registered in MXNet with NNVM, we propose a similar lightweight approach that doesnt require compiling custom operators with NNVM.

...

After a library is loaded, users need to call their operators from their application. Currently the Python CustomOp provides a mechanism that we can re-use for C++/CUDA operators: mx.nd.CustomOp(op_type='my_op', *args). We'll use register custom operators in the same approach ndarray and symbol namespaces that regular operators use here to provide a similar user experience.

...

The figure below shows the high-level architecture proposed. The user will call the load_op_lib API mx.library.load API to load their custom operator library. This will result in the operators being discovered /registered from the so/dll and re-registered into MXNet's CustomOp registry (similar to NNVM op registry, but only for customOps). Then the user will call the CustomOp operator and specify the op_type attribute to be the name of the customOp from their library. This customOp will indirectly call their custom operator in the library at runtime. their operator directly just like they would for any regular MXNet operator.

Image AddedImage Removed

When building a customOp Library, users will write 4 functions for each operator: FCompute Forward, InferShape, InferType, and ParseAttrs. These are similar to the standard functions required for current Backend C/C++/CUDA operators in MXNet. Similarly, they will register their op (ie. the 4 functions) in the library. As shown above, this “local-registration” will be parsed by MXNet when loading the customOp library at runtime.

Image RemovedImage Added

Runtime Behavior

Heres the overall runtime behavior for CustomOps. Its it is broken down into 2 parts: initial library load, and operator execution.

First, the user writes their custom op functions: FComputeForward, InferShape, InferType, and ParseAttrs. Then they statically register the functions in their library with REGISTER_OP. Next they compile and produce a shared library (so/dll). Then they run their start MXNet model, and load their library. During the initial setup, the user calls load_op_lib in mx.library.load in their code to load their shared library. During the loading process, MXNet parses all of the operators that have been registered by getting the number of ops registered with the _opRegSize function. Then it iteratively gets each op by calling the _opRegGet and analyzes it before re-registering it inside MXNet's customOp NNVM registry.


Then, later Later when a CustomOp operator is bound/executed the functions from the shared library are executed. During the bind step, the Operator is looked-up by op name and the attributes for the operator are analyzed by the customOp's parseAttrs function in the shared library. Then for For type and shape inference, the respective functions are also called through the inferType and inferShape APIs. Lastly, when executing the forward pass, the FCompute Forward function is called for the operator from the shared library.


New MXNet APIs

These are new APIs that

...

are added to MXNet

...



C APIs

  • MXLoadCustomOpLib MXLoadLib - API to load operator libraries
    • Checks version number
    • Calls initialize on
    • Load the customOp library
    • Go through the operators in the library
    • Check that each operator defines required functions
      • ParseAttrs, InferType, InferShape, FCompute
    • Register each operator found

Python APIs

  • load_op_lib - API to load operator libraries
    • Takes a path to the operator library
    • checks if the path exists and if points to file
    • calls C API MXLoadCustomOpLib MXLoadLib to perform actual loading
    • mx.operator.load_op_lib('/path/to/libtest.so')

New CustomOp Operator

  • CustomOp - new operator that executes custom operators loaded from the library
    • Takes op_type to identify custom operator name
    • Takes any number of kwargs as attributes/parameters
    • Takes any number of in-order args as input arrays
    • b = mx.nd.CustomOp(a,op_type='sam',myParam='2')

APIs for implementing Custom Operators

...

  • parseAttrs - takes a set of key/value pairs for attributes and gives users an opportunity to validate the attributes passed to their custom operator.
    • int parseAttrs(std::map<std::string, std::string> attrs, 
      int* num_in,
      int* num_out);
    • Inputs: the map of attributes passed to the operator from the user
    • Outputs: num_in, num_out - the number of input/output arrays required for this operator
    • returns 1 if success, or zero if failure
  • inferType - performs type inference for this operator
    • int inferType(std::map<std::string, std::string> attrs, 
      std::vector<int> &intypes,
      std::vector<int> &outtypes);
    • Inputs: the map of attributes
    • Inputs/Outputs: intypes, outtypes - the list of input/output types that should be inferred. Values of of -1 should be defined by this operator as a specific type
    • returns 1 if success, or zero if failure
  • inferShape - performs shape inference for this operator
    • int inferShape(std::map<std::string, std::string> attrs, 
      std::vector<std::vector<unsigned int>> &inshapes,
      std::vector<std::vector<unsigned int>> &outshapes);
    • Inputs: the map of attributes
    • Inputs: inshapes - the shapes of the input arrays
    • Outputs: outshapes - the shapes of output arrays
  • fcompute forward - performs computation forward pass of this operator
    • int
      myFCompute
       forward(std::map<std::string, std::string> attrs, 
      std::vector<MXTensor> inputs,
      std::vector<MXTensor> outputs,
      OpResource res);
    • Inputs: the map of attributes
    • Input data: inputs, input tensors
    • Output data: outputs, output tensors

...

  • REGISTER_OP - registers the operator in the library
    • REGISTER_OP(sam)
      .
      setFCompute_cpu
      setForward(myFCompute) 
      .setParseAttrs(parseAttrs)
      .setInferType(inferType)
      .setInferShape(inferShape);
    • REGISTER_OP - macro that defines an custom operator object with given name
    • setFCompute_cpu - sets the FCompute function for CPU context
    • setFCompute_gpu setForward - sets the FCompute function for GPU context
    • setParseAttrs - sets the parse attributes function
    • setInferType - sets the infer types function
    • setInferShape - sets the infer shapes function

Goals/Usecases

MXNet Java Inference API#Goals

Open Questions

Proposed Approach

Example Custom Operators

Examples of creating custom operators, building them into a library, and loading them at runtime to test them can be found here:

https://github.com/apache/incubator-mxnet/tree/master/example/extensions/lib_custom_op

The GEMM example contains two operators. The state-less operator shows a regular operator here: 

Initial PoC in this branch: https://github.com/samskalickyapache/incubator-mxnet/tree//blob/master/example/extensions/lib_custom_op/

MXNet Java Inference API#ProposedApproach

MXNet Java Inference API#ClassDiagram

MXNet Java Inference API#SequenceDiagram

Addition of New APIs

Backward compatibility

Performance Considerations

Test Plan

Alternative Approaches

MXNet Scala API Usability Improvement#AlternativeApproachconsidered

Technical Challenges 

MXNet Scala API Usability Improvement#TechnicalChallenges

Milestones

...

gemm_lib.cc#L169-L174

The example GEMM stateful operator is here:

https://github.com/apache/incubator-mxnet/blob/master/example/extensions/lib_custom_op/gemm_lib.cc#L220-L225

The example build command to build the GEMM operators into a library is here:

https://github.com/apache/incubator-mxnet/blob/master/example/extensions/lib_custom_op/Makefile#L21

The example python code to load the library and test the operator for both symbol and ndarray APIs is here:

https://github.com/apache/incubator-mxnet/blob/master/example/extensions/lib_custom_op/test_gemm.py