Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This worked well most of the time. However, when trying to find such a tree for 6 GPUs, we notice that sometimes this gets stuck and an edge cannot be found to link two such clusters. In such cases, we resorted to exhaustive search.

Link usage penalty

Trees are generated in such a sequential fashion described above. To discourage later trees from using previously used links, we apply a multiplicative penalty term MXNET_KVSTORE_LINK_USAGE_PENALTY (default = 0.7) whenever a link has been used. This is multiplied to the initial link topology adjacency matrix where 3 represents double NVLink connection and 2 represents single NVLink connection.

When to switch between Single and Multiple tree

...