Label Prediction by the @mxnet-label-bot

Description:

(Too wordy, simplify)

Currently, within the incubator-mxnet repo there are over 800+ open issues and new ones being generated every day. We would like to be able to ease this process and better handle developers' issues. With the use of labelling, MXNet contributors can filter issues so that they can offer their help with the issues users face. As well, this can be useful to bring in new contributors. For example, a Scala expert may know how to handle an issue posted on the MXNet repo regarding the Scala API. They would be able to assess the issue we face on our repo and can easily become a contributor. Today, we employ the label bot to help ease the issue/pull request labelling process. Given the data of previously labelled issues and pull requests, an interesting use case opens up. Based upon this data, we can provide label predictions on new issues and pull requests. Overall we can provide a better experience to the community as we will able to address issues in a more efficient matter.

Proposal:

The label bot will provide a prediction service to label certain issues and pull requests. We will gather these metrics and accuracy figures, and given a threshold we can have the label bot

This prediction service offered by the label bot can be useful to the community for labelling certain issues and pull requests based upon certain metrics and accuracy figures. The bot will then be able to provide labels or label recommendations on newly opened issues and pull requests.

Please also include the goal of the model like we need to make sure all auto-labelled issues have the right labels even if we sacrifice some recommend labels that with low confidence score

Data Analysis:

Provide more context here as in what is the data (or why we believe such things...)

consider applying massive data on the word embedding level (such as tensorflow issues)

On word embedding we can apply a bigger data set such as tf issues

Note: Training data here is limited (~13,000 issues both closed and opened), after the data cleaning process we expect this value to be greatly further reduced. Also, we have to consider that not all issues have been labelled and if labelled not all labels which may represent that issue have been

Metrics:

Multi-label Classification:
Accurate prediction of at least one label in an issue across issues: ~87%
Accuracy in predicting all labels in an issue (i.e. an exact match of all labels to an issue) across issues: ~20%

How was the data collected:

The labels below were chosen for prediction initially by the model. Only the issues which are specific to these labels are what is being tested on, in other words either the specific label being tested on was predicted by the model or the specific label was the actual label on the issue. The accuracy shown below denotes where the model predicted a label and that was one of the actual labels in the repo.

*** The accuracy metric was collected using sklearn's accuracy_score method ***

(https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html#sklearn.metrics.accuracy_score)

Classification Accuracy:

Label	Accuracy	Issue Count
Performance	100%	87
Test	99.59%	245
Question	97.02%	302
Doc	90.32%	155
Installation	84.07%	113
Example	80.81%	99
Bug	78.66%	389
Build	69.87%	156
onnx	69.57%	23
scala	67.24%	58
gluon	44.38%	160
flaky	42.78%	194
Feature	32.24%	335
C++	29.33%	75
ci	28.30%	53
Cuda	22.09%	86

*** In depth analysis with precision, recall, and f1 ***

Classification report with precision, recall, and f1 score

Data Insights:

Motivations/Conclusion:

This shows us which labels which we can provide by the model given a certain accuracy threshold. The bot would help in being able to determine at least one label to new issues but may not always be able to deliver all the labels that are associated with an issue.

Page tree