Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If you set epochs=1, then number_of_iterations is logically the same as number of epochs for single node systems.

Note that the number of model configurations does not need to be the same as the number of segments, like it is in the toy example above.  In fact, it usually will not be the same. If you have more model configurations than segments, some of the model configurations will be held in a queue while others are being trained.  The ones in the queue will be trained in a round robin fashion. Conversely, if you have fewer model configurations than segments, then some of the segments will not be busy 100% of the time since they will be waiting for model configurations to train.

Example

Below are results from training the well-known CIFAR-10 dataset using two different CNNs comprising 500K-1M weights and various hyperparameters.  In total there were 16 different model configurations trained on a cluster of 16 segments. (As mentioned above, the number of model configurations does not need to be the same as the number of segments.) The model configuration with the best validation accuracy is shown in the chart.

...