You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Issue

When we are profiling operators, the call count of each will be 2x the actual value in AggregateStats. In another word, there is a 2x duplication. This issue is isolated to operators. AggregateStats entries in "Device Storage", "MXNET_C_API", or other domains do not have this issue.

Background

In profiler.h, we have a bunch of classes such as ProfileTask, ProfileEvent, ProfileOperator, etc. Those "profile classes" have start() and stop() functions to call before and after a particular event that we want to profile. Within those classes, we also have subclasses OprExecStat, EventStat, TaskStat etc. Those "stat classes" possess the information that we want to dump for each event we profile.

The idea is that the "profile classes" will send an instance of the "Stat classes" to AggregateStats. And in AggregateStats, we will process the stats one by one, and we will map the name of the operator/API to its AggregateStats entry.

With that said, we are able to know what's causing the duplication: when we profile operators, we use ProfileOperator; however, within ProfileOperatpor, we also has a member variable "as_task_" which is of class ProfileTask. The intention is to generate two events that fall into different domains for one single operator call. However because those two events have the same operator name, they will cause duplication in AggregateStats.

"MXNET_C_API" calls will not cause duplications, because for them we use ProfileTask only. I other words, we are only generating one event for each call.

Solution

All the "stat classes" inherit from ProfileStat in profiler.h. There, we can add a new bool member variable "enable_aggregate_". This variable defaults to true and controls whether we want to use or skip this stat in AggregateStats (an if statement is added in OnProfileStat() in aggregate_stats.cc). Also, we want to add yet another "enable_aggregate_" to ProfileTask. The idea is that we can set this bool, and we propagate the value to ProfileStat's  "enable_aggregate_" through the lambda function in ProfileTask::SentStat(). Finally, in ProfileOperator, we want to set the  "enable_aggregate_" of "_as_task" too false. This way, we are continuing to produce two events/stats for each event call, but only the one generated by ProfileOperator will get registered in AggregateStats. The stat generated by "as_task_"/ProfileTask will be skipped, so we no longer have a duplication.



  • No labels