Page History

...

Benchmarks for Apache MXNet operators.
Common combination (Fused) of operators. Conv + Relu, Conv + BatchNorm.
Individual operator benchmarks to capture - time for operator execution (speed), memory usage.
Fine grained individual operator benchmarks to capture - time for forward pass, time for backward pass and both.
Ability to run operator benchmarks with default inputs or customize with user specific inputs.
Ability to run operator benchmarks on CPU/GPU with different flavors of MXNet (mxnet-mkl, mxnet-cu90mkl etc.)
Benchmarks for operators with varying inputs to uncover any performance issues due to skewed input data. Ex: Measuring operator performance on small input tensors, large input tensors along with average normally used tensor sizes.
Ability to run one, group or all operator benchmarks.
Ability to extract results in multiple usable format - Python Dictionary, JSON, CSV, MD
Statistics:
1. Mean, Median, P50, P90, P99

Design Tenets

Defaults => Common use cases should be extremely easy, customized complex use cases should be possible.

Example: I should be able to run Add operator benchmarks without specifying any inputs and library should provide benchmarks on valid default inputs. At the same time, as a power user, I should be able to provide my own inputs such as Tensor Shapes and context to run the benchmarks.

Minimum Learning Curve => Keep APIs same or close to native NDArray / Gluon Operators being benchmarked.

Example: If I am doing benchmarks on nd.add(lhs, rhs) operator, interface in the benchmark utility should be similar with zero learning curve.

...