Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

MXNet allows users to create custom operators if the existing NDArray operators cannot meet their needs. However, profiling custom operator is not well supported currently.

To see the issues, we need to first understand how do custom operators work. First of all, CustomOperator is a singleton class, and it has its own task queue and worker threads. When a custom operator is executed by the engine, the engine thread will call Push() in CustomOperator. Next, Push() will create the actual work load (callback to forward() or backward, see below) and push it to the CustomOperator's task queue, where its own worker thread will then pick the task up and run it.

...

Also in CustomOperator’s Push(), a special callback named “CustomOperator”(Now renamed to “Dummy_Wait”, we will also use this name below) is pushed to the engine. The idea is that “CustomOperator” “Dummy_Wait” have dependencies on the custom operator and it will get executed at last to make sure the custom operator event will span over the execution of both the pure python code as well as the sub-operators.

...

* The start timestamp of custom operator events is correct, but the end timestamp will wait for “Dummy_Wait" which itself will wait for a bunch of variable deletions. The end result is that custom operator events will appear way longer than the actual computation. Also, because of parallelism, users can never know that at a specific point in a custom operator event, whether we are executing pure python code, the sub operators, or we are simply waiting for dependencies to clear.
* Because custom operator events will not end until “Dummy_Wait” starts, the former will not wrap around the latter. This violates the TraceCompass format will mess up chrome://tracing timeline (shown below). Each event has a begin and an end timestamp; however Chrome will treat one event as two separate ones, and say that one has no “BEGIN” and the other has no “END”
* All the custom operator events are called “Custom”. If users have more than one custom operator, they are not able to distinguish them.

...