Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently, within the incubator-mxnet repo there are over 800+ open issues and new ones being generated every day. The goal is We would like to be able to ease this process and better handle developerdevelopers' s issues in an appropriate manner. With the use of labelling, experts in their respective areas can provide the help that users face. We employ MXNet contributors can filter issues which developers have and offer their help with the issues users face. As well, this can be useful to bring in new contributors. For example, a Scala expert may know how to handle an issue posted on the MXNet repo regarding the Scala API. They would be able to assess the issue we face on our repo and can easily become a contributor. Today, we employ the label bot today to help ease the issue/pull request processlabelling process.

Given the data which the repository provides of issues and pull requests which have been previously labelled, an interesting use case of this data opens up. Based upon the data of this repository, we are able to provide insights and predictions of labels on new issues and pull requests. This mechanism will provide a better experience for those who have raised an issue to get a faster response, and it allows for existing and new contributors to better filter for their areas of expertise who are wanting to help out welcoming new developers.

...

This prediction service offered by the label bot can be useful to the community for labelling certain issues and pull requests based upon certain metrics and accuracy figures. The bot will then be able to provide labels or label recommendations on newly opened issues and pull requests.


Please also include the goal of the model like we need to make sure all auto-labelled issues have the right labels even if we sacrifice some recommend labels that with low confidence score

Data Analysis:

Provide more context here as in what is the data (or why we believe such things...)


consider applying massive data on the word embedding level (such as tensorflow issues)

On word embedding we can apply a bigger data set such as tf issues



Note: Training data here is limited (~13,000 issues both closed and opened), after the data cleaning process we expect this value to be greatly further reduced. Also, we have to consider that not all issues have been labelled and if labelled not all labels which may represent that issue have been 

...