Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Provide a generic utility for executing an operator benchmarks and performance tests.
    1. This is responsible to creating input tensors of required shape on a given dtype, context.
    2. Execute the provided operator - forward or forward + backward.
    3. This generic utility will be integrated with MXNet profiler.
    4. Captures the Capture profile output from MXNet profiler - time, memory.
    5. Return a dictionary of results.
  2. Input for the performance tests will be a key/value config.

...

Code Block
languagepy
"""
MXNet operator performance benchmarks.

NOTE:
1. You can pass list of input dictionary to run benchmarks for an operator with different input configuration.
2. Results are dictionary of time, memory for the benchmark runs.
"""

# Run performance test for Add operator
results = run_performance_test(F=mx.nd.add, ctx=mx.cpu(), warmup=10, runs=50, inputs=[{"lhs": (1024, 1024),
                              												          "rhs": (1024, 1024),
                              												          "initializer": nd.normal,
                              												          "run_backward": True,
                              												          "dtype": "float32"}])

# Run performance test for Conv2D operator
results += run_performance_test(F=nn.gluon.Conv2D, ctx=mx.cpu(), warmup=10, runs=50, inputs = [{"data": (32, 3, 256, 256),
                              																  "data_initializer": nd.normal,
                              																  "channels": 64,
                              																  "kernel_size": (3, 3),
                              																  "strides": (1, 1),
                              																  "padding": (0, 0),
                              																  "dilation": (1, 1),
                              																  "layout": "NCHW",
                              																  "activation": None,
                              																  "run_backward": True,
                              																  "dtype": "float32"]}

How does the backend profiling utility code looks like?

Below we take an example of profiling Add operator.

Code Block
languagepy
# Configurations
warmup = 25
runs = 50
run_backward = True

# Operator to benchmark
F = mx.nd.add

# Prepare data for the operator
lhs = mx.nd.ones(shape=(1024, 1024))
rhs = mx.nd.ones(shape=(1024, 1024))
lhs.attach_grad()
rhs.attach_grad()
mx.nd.waitall()

# Warmup
print("Warming up....")
for _ in range(warmup):
    with mx.autograd.record():
        res = mx.nd.add(lhs, rhs)
    res.backward()
    mx.nd.waitall()
print("Done warming up....")

# Run Performance Runs
print("Running performance runs....")
profiler.set_config(profile_all=True, aggregate_stats=True)
# Start Profiler
profiler.set_state('run')
for _ in range(runs):
    with mx.autograd.record():
        res = mx.nd.add(lhs1, rhs1)
    res.backward()
    mx.nd.waitall()

# Stop Profiler 
profiler.set_state('stop')

# Fetch Results from Profiler
# We will add 2 new APIs in Profiler - profiler.get_summary(), profiler.reset()
# profiler.get_summary() => will be a JSON string representing the output as shown below.
# profiler.reset() => Resets all the counter in the current profiler.

print("Done Running performance runs....")
print(profiler.dumps(reset=True))


How to capture Time?

We will be using MXNet profiler 


Pros

  1. No need to write 1 class per operator to set up a performance test. Whenever a new operator is created, developer needs to add a `run_performance_test(..)` line with a list of inputs to run performance tests. A generic utility will handle the execution.
  2. Less code, easy to maintain.
  3. More control for users - default inputs, random inputs, specific user defined inputs.
  4. Deterministic and better suited for performance benchmarks, reproducibility and CI integration.
  5. With Python interface:
    1. Easy to maintain and develop.
    2. Reflects the performance as seen by the users. (Majority users using Python interface)
    3. Fastest way to get performance tests in place. We do not have any tests in place as of today.

...