Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

View file
namenccl.pdf
height400
View file
nameallreduce.pdf
height400

                    (a) Ring algorithm used by NCCL                                       (b) Parameter server algorithm

...

View file
nametree.pdf
height400
View file
namemulti-tree.pdf
height400

                             (a) Single tree algorithm                                             (b) Multiple tree algorithm

...

We looked at using two methods to generate the binary tree. Since we are looking for a balanced binary tree instead of a maximum spanning tree, we cannot use polynomial time algorithms such as Kruskal's or Boruvka's MST algorithm. Using the link topology saved in an adjacency matrix as input, we used:

  • Kernighan-Lin heuristicalgorithm: This is a popular hierarchical clustering method used to generate graph clusterings based on the Kernighan-Lin algorithm [4]. It is part of the popular METIS package [5].
  • Exhaustive search: We observed that in practice, Kernighan-Lin would get stuck due to our modification to the algorithm. In such cases, we resort to exhaustive search.

Since we did not want to introduce additional dependencies to MXNet, we implemented our own version of the Kernighan-Lin algorithm. We modify the Kernighan-Lin algorithm by doing recursive clustering on the graphfor a purpose it was not intended for, i.e. to find a binary tree embedding on the link topology we are interested in. Our algorithm works as follows:

1.

...

Add the vertex we want to be root

...

to the set of root nodes. Begin with entire graph in one cluster.

2. While at least one cluster has at least 2 vertices

...

:

a) Apply Kernighan-Lin heuristic to discover a clustering of the graph into

...

2. Pass in what GPU we want to be root of the tree.

...

two clusters.

b) Look for the edge that satisfies the following conditions:

      • Is one of the edges that crosses between the two clusters
      • One vertex of the edge is one of the
  • root
      • rootnodes
      • Has the highest weight of all edges that satisfy the above conditions

...

c) If such an edge is found, add both vertices to our binary tree discovered so far. Also, add both vertices to the set of root nodes for the next iteration.

    If no such edge is found, use exhaustive search to find a binary tree from this root.

3. Save the binary tree found in order to do reduction.

This worked well most of the time. However, when trying to find such a tree for 6 or 7 GPUs, we notice that sometimes this gets stuck and an edge cannot be found to link two such clusters. In such cases, we resorted to exhaustive search.

...