Page History

...

The labels below were chosen for prediction initially by the model. Only the issues which are specific to these labels are what is being tested on, in other words, either the specific label being tested on was predicted by the model or the specific label was the actual label on the issue. The accuracy shown below denotes where the model predicted a label and that was one of the actual labels in the repo.

...

(https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html#sklearn.metrics.accuracy_score)

Classification Accuracy:

Label	Accuracy	Issue Count
Performance	100%	87
Test	99.59%	245
Clojure	98.90%	12 (Test set: 1000)
Java	98.50%	2 (Test set: 1000)
Python	98.30%	170 (Test set: 1000)
C++	97.20%	2 (Test set: 1000)
Scala	96.30%	40 (Test set: 1000)
Question	97.02%	302
Doc	90.32%	155
Installation	84.07%	113
Example	80.81%	99
Bug	78.66%	389
Build	69.87%	156
onnx	69.57%	23
gluon	44.38%	160
flaky	42.78%	194
Feature	32.24%	335
ci	28.30%	53
Cuda	22.09%	86

*** In depth analysis with precision, recall, and f1 ***

Classification report with precision, recall, and f1 score

Label	Precision	Recall	F1 Score	Count
Performance	100%	100%	100%	87
Test	99.59%	100%	99.8%	245
Clojure	98.31%	98.90%	98.61%	12 (Test set: 1000)
Python	98.70%	98.30%	98.50%	170 (Test set: 1000)
Question	100%	97.02%	98.49%	302
Java	97.24%	98.50%	97.87%	2 (Test set: 1000)
C++	98.28%	97.20%	97.74%	2 (Test set: 1000)
Scala	97.37%	96.30%	96.84%	40 (Test set: 1000)
Doc	100%	90.32%	94.92%	155
Installation	100%	84.07%	91.35%	113
Example	100%	80.81%	89.39%	99
Bug	100%	78.66%	88.06%	389
Build	100%	69.87%	82.26%	156
onnx	80%	84.21%	82.05%	23
gluon	62.28%	60.68%	61.47%	160
flaky	96.51%	43.46%	59.93%	194
Feature	32.43%	98.18%	48.76%	335
ci	48.39%	40.54%	44.12%	53
Cuda	22.09%	100%	36.19%	86

The test set here represents a test set of the data snippets of files for the specific languages (covered later on below)

...

F1 score balances both the precision and recall scores

	Label was actually on the issue	Label was not on the issue
Label was predicted	Desired outcome	False Positive – A high precision value means that this is reduced
Label was not predicted	False Negative - A high recall value means that this is reduced	Desired outcome

Programming languages were trained on large amounts of data pulled from a wide array of repositories we are able to deliver these high metrics especially with regards to programming languages by making use of MXNet for deep learning to learn similarities among these languages we consider (which are the programming languages that are present in the repo). Specifically this was trained on data snippets of files pulled from the data files present here: https://github.com/aliostad/deep-learning-lang-detection/tree/master/data. Thus, we can believe that this accuracy measurement can be maintained on prediction of new issues which have code snippets presented within them. Training was done with a 6 layer deep model in Keras-MXNet using the 2000 files present (and creating snippets out of them) for the languages we are interested in from the repository data above. For inference, we are using pure MXNet with the model being served from the work of this repository: using Model Server for Apache MXNet (MMS) - a flexible and easy to use tool for serving deep learning models for MXNet (https://github.com/awslabs/mxnet-model-server. )

Two models combined for specific groups of labels allow for us to be able to deliver this capability:

...

Page tree

Versions Compared

Old Version 41

New Version Current

Key

Classification Accuracy: