Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

(https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html#sklearn.metrics.accuracy_score)

Classification Accuracy:

LabelAccuracyIssue Count
Performance100%87
Test99.59%

245

Clojure98.90%12 (Test set: 1000)
Java98.50%2 (Test set: 1000)
Python98.30%170 (Test set: 1000)
C++97.20%2 (Test set: 1000)
Scala96.30%40 (Test set: 1000)
Question97.02%302
Doc90.32%155
Installation84.07%113
Example80.81%99
Bug78.66%389
Build69.87%156
onnx69.57%23
scala67.24%58
gluon44.38%160
flaky42.78%194
Feature32.24%335
C++29.33%
75
ci28.30%53
Cuda22.09%86

Language Detection from Code Snippets in Issues: 

LanguageAccuracyClojure98.90%Java98.50%Python98.30%
C++97.20%Scala96.30%



*
** In depth analysis with precision, recall, and f1 ***

Classification report with precision, recall, and f1 score

LabelPrecisionRecallF1 ScoreCount
Performance100%100%100%87
Test99.59%100%99.8%245
Clojure98.31%98.90%98.61%12 (Test set: 1000)
Python98.70%98.30%98.50%170 (Test set: 1000)
Question100%97.02%98.49%302
Java97.24%98.50%97.87%2 (Test set: 1000)
C++98.28%97.20%97.74%2 (Test set: 1000)
Scala97.37%96.30%96.84%40 (Test set: 1000)
Doc100%90.32%94.92%155
Installation100%84.07%91.35%113
Example100%80.81%89.39%99
Bug100%78.66%88.06%389
Build100%69.87%82.26%156
onnx80%84.21%82.05%23
scala86.67%75%80.41%
58
gluon62.28%60.68%61.47%160
flaky96.51%43.46%59.93%194
Feature32.43%98.18%48.76%335
C++55%38.6%45.36%75
ci48.39%40.54%44.12%53
Cuda22.09%100%36.19%86

...

Language



Precision

...

Precision here representing how accurate our classifier was in correctly labelling an issue given all the times it had predicted that label.

...


F1 score balances both the precision and recall scores

Programming languages were trained on large amounts of data pulled from a wide array of repositories we are able to deliver these high metrics especially with regards to programming languages by making use of MXNet for deep learning to learn similarities among these languages we consider (which are the programming languages that are present in the repo). Specifically this was trained on data snippets of files pulled from the data files present here: https://github.com/aliostad/deep-learning-lang-detection/tree/master/data.


Motivations/Conclusion:

We do notice that there is a case that may be present of overfitting here, especially with the case of the Performance label. However in looking further into the issues labeled as Performance, we notice that similar words and phrases are included across issues labeled as Performance (i.e. in most cases the word itself, and words like speed..). Given this data, we are able to see which labels the model can predict accurately for. Given a certain accuracy threshold, the bot has the potential to label an issue given that it surpasses this value. As a result, we would be able to accurately provide labels to new issues. 

...