Page History

...

Part I - Email Bot
An amazon cloudwatch event will trigger lambda function in a certain frequency(ex: 9am every Monday). Once the lambda function is executed, the issue report will be generated and sent to the mailing list. Figure1 shows the email bot design architecture and Figure2 shows demo email content

Figure1 Email Bot Design

Image Modified

Figure 2 Demo Email Content

Part II -Predict labels automatically for unlabeled issues
Amazon cloudwatch event (a) will trigger lambda function(a) 9am every Monday. At that time, lambda function(a) will generate an email and write non-labelled issues' data into a Google sheet. Every team member has access to view and fill in labels to it. 12 hours later, another lambda function (lambda function b) will be executed and add labels to corresponding issues. This bot should have restricted permissions to avoid unexpected operations. Figure3 shows the bot design, Figure 4 shows the demo email content and Figure 5 shows the demo Google sheet content.

Figure 3 Label Bot Design

...

This part will use Machine Learning models to predict labels and send them by emails.
Image Added

Figure 3 Lambda with Elastic Beanstalk

Image Added
Figure 4 Demo Email Content

Part III Label Bot
This label bot serves to help non-committers to add labels. A contributor can @mxnet-label-bot and comment "@mxnet-label-bot, please add labels: [A, B]". Then this bot will recognize notifications and add .
All code is on a lambda function. A CloudWatch event will trigger this lambda function every 5 minutes. Once the lambda function is executed, it will read valid notifications, extract labels' information from comments then add labels. Figure shows architecture.
Image AddedFigure 5 Label Bot Design

4. Multi-label classification

...

Each instance can be assigned with multiple categories, so these types of problems are known
...
as multi-label classification
...
problem, where we have a set of target labels. Multi-label classification problems are very common in the real world, for example, audio categorization, image categorization, bioinformatics..etc. Our project mainly focus
...
on text categorizations
...
because labels are learned from issue title and issue description.

Steps to achieve it

Step 1: Retrieve Data
Extract data from GitHub issues into JSON format.

Step 2: Data Cleaning
Data cleaning is very important for us to keep the valuable information such as keywords extraction and reduce the noise.

Step 3: Vector Representation
Classifiers and learning algorithms cannot directly process the text documents in their original form. During a preprocessing step, the documents are converted into a more manageable representation. Typically, the documents are represented by feature vectors.

...

Problem Transformation

Binary Relevance
This is the simplest technique, which basically treats each label as a separate single class classification problem.
Classifier Chains
The first classifier is trained just on the input data and then each next classifier is trained on the input space and all the previous classifiers in the chain.
Label Powerset
Transform the problem into a multi-class problem with one multi-class classifier is trained on all unique label combinations found in the training data.

Algorithm adaptation
Manual: rule-based
Automatic:

Vector space model based

Prototype-based
K-nearest neighbor
Decision-tree
Neural Networks
Support Vector Machines

Probabilistic or generative model based

Naive Bayes classifier

...

5. Technical Challenges

Restrict permissions of this bot to avoid unexpected operations.
Training data is limited.

...

6. Reference

...

Page tree

Versions Compared

Old Version 7

New Version 8

Key

Part I - Email Bot

Part II -Predict labels automatically for unlabeled issues

Image Added
Figure 4 Demo Email Content

Part III Label Bot

4. Multi-label classification

Steps to achieve it

5. Technical Challenges

6. Reference

Page tree

Page History

Versions Compared

Old Version 7

New Version 8

Key

Part I - Email Bot

Part II -Predict labels automatically for unlabeled issues

Image AddedFigure 4 Demo Email Content

Part III Label Bot

4. Multi-label classification

Steps to achieve it

5. Technical Challenges

6. Reference

Image Added
Figure 4 Demo Email Content