Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This prediction service offered by the label bot can be useful to the community and in for its implementation, the label bot can either auto label certain issue or pull requests or provide a recommendation by commenting on the appropriating threads (for labelling certain issues and pull requests based upon certain metrics and accuracy figures. The bot will then be able to provide labels or label recommendations on newly opened issues and pull requests). On the prediction, we 

Data Analysis:

Note: Training data here is limited (~13,000 issues both closed and opened), after the data cleaning process we expect this value to be greatly further reduced.

...

Multi-label Classification:
Accurate prediction of at least one label in an issue across issues: ~87%
Accuracy in predicting all labels in an issue (i.e. an exact match of all labels to an issue) across issues: ~20%


These were labels chosen for prediction initially by the model. Only the issues which are specific to these labels are what is being tested on.

...

LabelAccuracyIssue Count
Performance100%87
Test99.59%245
Question97.02%302
Doc90.32%155
Installation84.07%113
Example80.81%99
Bug78.66%389
Build69.87%156
onnx69.57%23
scala67.24%58
gluon44.38%160
flaky42.78%194
Feature32.24%335
C++29.33%75
ci28.30%53
Cuda22.09%86

*** In depth classification analysis with precision, recall, and f1 ***

Classification report with precision, recall, and f1 score:


Data Insights:


There may be some potential overfitting happening with the 'Performance' label in particular – will require further examination.