Page History

...

Label	Precision	Recall	F1 Score	Count
Performance	100%	100%	100%	87
Test	99.59%	100%	99.8%	245
Clojure	98.31%	98.90%	98.61%	12 (Test set: 1000)
Python	98.70%	98.30%	98.50%	170 (Test set: 1000)
Question	100%	97.02%	98.49%	302
Java	97.24%	98.50%	97.87%	2 (Test set: 1000)
C++	98.28%	97.20%	97.74%	2 (Test set: 1000)
Scala	97.37%	96.30%	96.84%	40 (Test set: 1000)
Doc	100%	90.32%	94.92%	155
Installation	100%	84.07%	91.35%	113
Example	100%	80.81%	89.39%	99
Bug	100%	78.66%	88.06%	389
Build	100%	69.87%	82.26%	156
onnx	80%	84.21%	82.05%	23
gluon	62.28%	60.68%	61.47%	160
flaky	96.51%	43.46%	59.93%	194
Feature	32.43%	98.18%	48.76%	335
ci	48.39%	40.54%	44.12%	53
Cuda	22.09%	100%	36.19%	86

The test set here represents a test set of the data snippets of files for the specific languages

Precision here representing how accurate our classifier was in correctly labelling an issue given all the times it had predicted that label.

...

Programming languages were trained on large amounts of data pulled from a wide array of repositories we are able to deliver these high metrics especially with regards to programming languages by making use of MXNet for deep learning to learn similarities among these languages we consider (which are the programming languages that are present in the repo). Specifically this was trained on data snippets of files pulled from the data files present here: https://github.com/aliostad/deep-learning-lang-detection/tree/master/data. Thus, we can believe that this accuracy measurement can be maintained on prediction of new issues which have code snippets presented within them.

Motivations/Conclusion:

We do notice that there is a case that may be present of overfitting here, especially with the case of the Performance label. However in looking further into the issues labeled as Performance, we notice that similar words and phrases are included across issues labeled as Performance (i.e. in most cases the word itself, and words like speed..). The training data for our word embeddings our model has trained on is able to give these kinds of results due to the use of word2vec which provides us with a high cosine similarity. Given this data, we are able to see which labels the model can predict accurately for. Given a certain accuracy threshold, the bot has the potential to label an issue given that surpasses this value. As a result, we would be able to accurately provide labels to new issues.

...

Page tree

Versions Compared

Old Version 31

New Version 32

Key