Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For 3), we will also use CustomOpProfiler. We will create a mapping from CustomOperator worker thread_id to the registered name of that custom operator. Then, in PushAsync() in threaded_engine.cc, we will call GenerateDisplayName() in CustomOpProfiler to see if we are in a CustomOperator worker thread. If so, then this operator being pushed to the engine is a sub-operator of a custom operator. We want to create a display name by concatenating the name of this operator to a prefix which is the name of the custom operator, something like “MyOp::_plus_scalar”. Furthermore, in class ProfileOperator in profiler.h, we need to check the display name of the operator. If the name contains “::”, then we profile them within domain “Custom Operator.”

More discussions

  • With this enhanced custom operator profiling, we also want to get rid of profiling “Dummy_Wait” entirely. This is done by adding a check in ProfileOperator in profiler.h.
  • Notice that because we are adding a function call to GenerateDisplayName() in PushAsync(), we are risking adding an overhead to every operator call (we need to get thread id and and the function has a lock). However in practice, because this function is short and has early return checks, this overhead is small enough to forgive. On my machine (2017 MacBook Pro 13’ i7), on average, for regular operator calls, this overhead is less than 1 micro second (it appears as 0). And for sub-operator calls, the overhead is always < 10 micro seconds and averages to < 5 micro seconds. This is to be compared to ~150 micro seconds taken by executing NDArray plus scalar on a 100*100 matrix. Notice this relative larger overhead will only happen to sub-operators of custom operators.
  • Currently CustomOperator has its own thread pool. In the future this may change so that custom operator calls will use engine worker threads directly. In that prospect, the proposed mapping from thread_id to custom operator names would continue to work.

Limitations

The first limitation is that with my current implementation, we are distinguishing regular operators and sub-operators in ProfileOperator profiler.h just base on the name. If the name contains "::", then we would think it is a sub-operator; if the name is "Custom", then we would think it is a custom operator. However, when uses register a new custom operator, they could actually name it "Custom" or "XX::XX", so ProfileOperator would misclassify. A solution is to add a check before user register new custom operators and reject "Custom" or any name that contains "::". But this risks forcing users to change existing models.


Alternatively, we could have a contact manager to help distinguish regular operators and sub-operators. This would involve wrapping frontend custom op forward() and backward() with a context manager and adding a bool "is_custom" to all the backend apis along the call stack when we push a operator to the engine. This method will cause much more damage than the proposed "thread_is mapping" method.

Visualization

Below is the new visualization after my change:

...