Page History

Table of Contents

This is the initial design of ML Based GitHub Bot. For more details, please refer: technical doc

1. Problem

Currently there are many issues on Incubator-MXNet repo, labeling issues can help contributors who know a particular area to pick up the issue and help user. However, currently issues are all manually labelled, which is time consuming. And every time maintainers need to @ a committer to add labels. This bot will help automate/simplify this issue labeling process.

2. Goal

Part I - Email Bot
Send daily GitHub issue reports to the mailing list:

Count of newly opened issues and closed issues in last 7 days
Average and worst response time for all new issues
List of non-responded new issues with links
List of non-responded issues outside SLA

Part II - Predict labels automatically for unlabeled issues

Send another version of daily GitHub issue reports to the mailing list:

Count of newly opened issues and closed issues in last 7 days

List of non-responded issues
List of unlabeled issues
Predictions of unlabeled issues
Pie chart with top 10 labels for all issues

Part II - Predict labels automatically for unlabeled issues
- Build a web server which could response to GET/POST requests and realize self-maintenance:
  - Predict labels: once it receives GET/POST requests with issue ID, it will send predictions back.
  - Self-maintenance: it will re-train Machine Learning models every 24 hours.
Part III - Label Bot:
This bot serves to help non-committers add labels to GitHub issues.
- Recognize people's commands. ie "@mxnet-label-bot, please add labels :[A, B]".
- Be able to add labels for incubator-mxnet issues using a committer's credentials.

3. Approach

Part I - Email Bot
An amazon cloudwatch event will trigger lambda function in a certain frequency(ex: 9am every Monday). Once the lambda function is executed, the issue report will be generated and sent to the mailing list. Figure1 shows the email bot architecture and Figure2 shows demo email content
Image Modified

Figure1 Email Bot Design

Image AddedImage Removed

Figure 2 Demo Email Content

Part II -Predict labels automatically for unlabeled issues
This part will use Machine Learning models to predict labels and send them by emails. Figure 3 shows the architecture and Figure 4 shows the demo email content.

Figure 3 Lambda with Elastic Beanstalk Image Removed
Figure 4 Demo Email Content

Part III - Label Bot
This label bot serves to help non-committers to add labels. A contributor can @mxnet-label-bot and comment "@mxnet-label-bot, please add labels: [A, B]". Then this bot will recognize notifications and add .
All code is on a lambda function. A CloudWatch event will trigger this lambda function every 5 minutes. Once the lambda function is executed, it will read valid notifications, extract labels' information from comments then add labels. Figure shows architecture.
Figure 5 Label Bot Design

4. Multi-label classification

Each instance can be assigned with multiple categories, so these types of problems are known as multi-label classification problem, where we have a set of target labels. Multi-label classification problems are very common in the real world, for example, audio categorization, image categorization, bioinformatics..etc. Our project mainly focus on text categorizations because labels are learned from issue title and issue description.

...

Problem Transformation

Binary Relevance
This is the simplest technique, which basically treats each label as a separate single class classification problem.
Classifier Chains
The first classifier is trained just on the input data and then each next classifier is trained on the input space and all the previous classifiers in the chain.
Label Powerset
Transform the problem into a multi-class problem with one multi-class classifier is trained on all unique label combinations found in the training data.

Algorithm adaptation
Manual: rule-based
Automatic:

Vector space model based

Prototype-based
K-nearest neighbor
Decision-tree
Neural Networks
Support Vector Machines

Probabilistic or generative model based

Naive Bayes classifier

5. Technical Challenges

Restrict permissions of this bot to avoid unexpected operations.
Training data is limited.

6. Reference

...

Page tree

Versions Compared

Old Version 11

New Version 12

Key

1. Problem

2. Goal

3. Approach

Part I - Email Bot

Image Modified

Part II -Predict labels automatically for unlabeled issues

Part III - Label Bot

4. Multi-label classification

5. Technical Challenges

6. Reference