Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The test set here represents a test set of the data snippets of files for the specific languages (covered later on below)

Precision here representing how accurate our classifier was in correctly labelling an issue given all the times it had predicted that label.

Recall here representing how accurate our classifier was in correctly labelling an issue given all the times the issue actually had that label.

F1 score balances both the precision and recall scores


Label was actually on the issueLabel was not on the issue
Label was predictedDesired outcome

False Positive – A high precision value means that this is reduced

Label was not predictedFalse Negative - A high recall value means that this is reducedDesired outcome

...

Programming languages were trained on large amounts of data pulled from a wide array of repositories we are able to deliver these high metrics especially with regards to programming languages by making use of MXNet for deep learning to learn similarities among these languages we consider (which are the programming languages that are present in the repo). Specifically this was trained on data snippets of files pulled from the data files present here: https://github.com/aliostad/deep-learning-lang-detection/tree/master/data. Thus, we can believe that this accuracy measurement can be maintained on prediction of new issues which have code snippets presented within them. Training was done with a 6 layer deep model in Keras-MXNet using the 2000 files present (and creating snippets out of them) for the languages we are interested in from the repository data above. For inference, we are using pure MXNet with the model being served from the work of this repository: https://github.com/awslabs/mxnet-model-server


Two models combined for specific groups of labels allow for us to be able to deliver this capability:

TFIDF vectorizer with LinearSVC - Used for the more generic labels (i.e. Performance, Bug, Test, .... )

MLP (Multilayer perceptron) - Used for coding language labels (i.e. Clojure, Python, Java, ... ) - due to access of a larger dataset

Motivations/Conclusion:

We do notice that there are potential cases of overfitting here, especially with the case of the Performance label. However in looking further into the issues labeled as Performance, we notice that similar words and phrases are included across issues labeled as Performance (i.e. in most cases the word itself, and words like speed..). The training data for the word embeddings that our model has trained on is able to give these kinds of results due because of word2vec which provides us with a high cosine similarity - we can speculate that these common words were grouped together and hence the model was able to predict these labels with a high accuracy. Given this data, we are able to see which labels the model can predict accurately for. Given a certain accuracy threshold, the bot has the potential to label an issue given that surpasses this value. As a result, we would be able to accurately provide some labels to new issues. Overall, the mxnet-label-bot will be able to provide an improved experience for its developers.

Two models combined for specific groups of labels allow for us to be able to deliver this capability:

TFIDF vectorizer with LinearSVC

MLP (Multilayer perceptron)MXNet developers.