Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

With increasing models/batch sizes we expect it to be dominated by the actual matrix operations.

Convolutional benchmark

AlexNet

Let's take a look at the smaller AlexNet, since it's expected to show the most differences.

Control group shows as expected almost no difference between different setups – againrecall, we use same OpenMP and precompiled MKL.

...

Same behaviour we see in the treatment group no matter which OpenMP is used.


 Control groupImage Modified

Control group


Treatment group shows no difference other than that "GCC-swing". Normalizing the data gives us average scores with ~1% difference, which is close to standard error.


 

Treatment group


ResNet152

Now we can observe a beautiful saturation of the throughput. Optimal batch size is between 16 and 32.

...

We can see pretty obvious patterns.

 

  • Newer compilers perform better than the older.
  • GOMP is slower than IOMP.

But the overall differences are pretty close to standard error and don't even reach 2%.

faster-rcnn Benchmark

...

As we can see, GOMP delivers ~3-5% worse performance than OMP. 

...