Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Motivations/Conclusion:

We do notice that there is a case that may be present are potential cases of overfitting here, especially with the case of the Performance label. However in looking further into the issues labeled as Performance, we notice that similar words and phrases are included across issues labeled as Performance (i.e. in most cases the word itself, and words like speed..). The training data for the word embeddings that our model has trained on is able to give these kinds of results due because of word2vec which provides us with a high cosine similarity - we can speculate that these common words were grouped together and hence the model was able to predict these labels with a high accuracy. Given this data, we are able to see which labels the model can predict accurately for. Given a certain accuracy threshold, the bot has the potential to label an issue given that surpasses this value. As a result, we would be able to accurately provide some labels to new issues. Overall, the mxnet-label-bot will be able to provide an improved experience for its developers.

...