Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Double NVLink connections (50 GB/s)
  2. Single NVLink connections (25 GB/s)
  3. PCI-E connection + CPU (10 GB/s)

Image RemovedImage Added

                                                        (a) 2D view                                                                              view                                                                (b) 3D view (PCI-E not shown)

Figure 3. Link topology of p3.16xlarge instance.

...

First, given a binary tree we show that it gives us the precise sequence of GPU sends in order to perform a reduce. On the left will be the link topology in 3D view. On the right is the portion of the binary tree we are interested in.

(a) Step 1: GPU1 sends to GPU5, where the combined results of GPU1 and GPU5 are reduced on GPU5.

(b) Step 2: GPU7 sends to GPU5, where the combined results of GPU1, GPU3, GPU5 and GPU7 are reduced on GPU5.

(c) Step 3: GPU4 sends to GPU5, where the combined results of all 8 GPUs are reduced on GPU5.

(d) On link topology (left), the sequence of sends forms a spanning tree. The operations also correspond to a binary tree (right).

...