Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When users create a new custom operator, they need to define both the forward() and backward() functions in python. When this operator is executed, one of these two functions will be called. And those two functions consist of two kinds of code: 1) python code and Numpy code (I call them pure python code) which will run in the CustomOperator's own worker thread 2) code that calls NDArray operators (I call them sub-operators) which will then be pushed to the engine to run asynchronously from the CustomOperator's worker threads.

Screen Shot 2019-06-13 at 5 22 39 PM

imageImage RemovedImage Added

Also in CustomOperator’s Push(), a special callback named “CustomOperator”(Now renamed to “Dummy_Wait”, refer to the screenshot above, we will also use this new name below) is pushed to the engine. The idea is that “Dummy_Wait” have has dependencies on the custom operator and it will get executed at last to make sure the custom operator event will span over the execution of both the pure python code as well as the sub-operators.

...

* The start timestamp of custom operator events is correct, but the end timestamp will wait for “Dummy_Wait" which itself will wait for a bunch of variable deletions. The end result is that custom operator events will appear way longer than the actual computation. Also, because of parallelism, users can never know that at a specific point in a custom operator event, whether we are executing pure python code, the sub operators, or we are simply waiting for dependencies to clear.
* Because custom operator events will not end until “Dummy_Wait” starts, the former will not wrap around the latter. This violates the TraceCompass format will mess up chrome://tracing timeline (shown below). Each event has a begin and an end timestamp; however Chrome will treat one event as two separate ones, and say that one has no “BEGIN” and the other has no “END” (refer to the second screenshot below).
* All the custom operator events are called “Custom”. If users have more than one custom operator, they are not able to distinguish them.

imageImage RemovedImage Added

image

To avoid confusion, those issues need to be fixed.

...

Notice that because we are adding a function call to GenerateDisplayName() in PushAsync(), we are risking adding an overhead to every operator call (we need to get thread id and and the function has a lock). However in practice, because this function is short and has early return checks, this overhead is small enough to forgive. On my machine (2017 MacBook Pro 13’ i7), on average, for regular operator calls, this overhead is less than 1 micro second (it appears as 0). And for sub-operator calls, the overhead is always < 10 micro seconds and averages to < 5 micro seconds. This is to be compared to ~150 micro seconds taken by scalar addition on executing NDArray plus scalar on a 100*100 matrix. Notice this relative larger overhead will only happen to sub-operators of custom operators.

...