Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Problem

  • Introduction
  • Goal
    • Part I - Email Bot
    • Part II - Label Bot
    • Part III - Determine labels automatically
  • Approach
    • Part I - Email Bot
    • Part II - Label Bot
    • Part III - Determine labels automatically
  • Technical Challenges
  • References

1.

...

Problem

Currently there are many issues on Incubator-MXNet repo, labeling issues can help contributors who know a particular area to pick up the issue and help user. However, currently issues are all manually labelled, which is time consuming. And every time maintainers need to @ a committer to add labels. This bot will help automate/simplify this issue labeling process.

...

  • Part I - Email Bot
    Create weekly email todev@mxnet.incubator.apache.org:
    (Instead of sending emails directly to dev@, another option is to create another email alia and ask people who are interested in weekly reports to join. )
    • Count of newly opened issues and closed issues in last 7 days
    • Average and worst response time for all new issues
    • List of non-responded new issues with links
    • List of non-responded issues outside SLA
  • Part II - Label Bot
    Create a bot to add labels for incubator-mxnet issues
    • Create weekly email to internal team members:
      • Count of newly opened issues and closed issues in last 7 days
      • List of non-labelled issues
      • List of non-responded issues
      • Pie chart with top 10 labels for all issues
      • Pie chart with top 10 labels for newly opened issues in last 7 days. (Add "unlabelled" as a segment)
      • A line/bar graph with week over week statistics of the number of issues closed and the number of issues opened
    • Generate a spreadsheet with detailed information of non-labelled issues. Every team member should have access to view and fill in labels to it.
    • Read filled-in labels and add labels to corresponding issue.
  • Part III - Determine labels automatically from GitHub issues: Specifications
    • Identify the corresponding programming language to it (ex: Python, C/C++, Scala)
    • Multi-label classification

3. Approach

  • Part I - Email Bot
    An amazon cloudwatch event will trigger lambda function in a certain frequency(ex: 9am every Monday). Once the lambda function is executed, the issue report will be generated and sent to the mailing list. Figure1 shows the bot design and Figure2 shows demo email content.

...

  • Problem Transformation
    • Binary Relevance
      This is the simplest technique, which basically treats each label as a separate single class classification problem.
    • Classifier Chains
      The first classifier is trained just on the input data and then each next classifier is trained on the input space and all the previous classifiers in the chain.
    • Label Powerset
      Transform the problem into a multi-class problem with one multi-class classifier is trained on all unique label combinations found in the training data.
  • Algorithm adaptation
    Manual:
    rule-based
    Automatic:
    • Vector space model based
      • Prototype-based
      • K-nearest neighbor
      • Decision-tree
      • Neural Networks
      • Support Vector Machines
    • Probabilistic or generative model based
      • Naive Bayes classifier

4.

...

Technical Challenges

  • Restrict permissions of this bot to avoid unexpected operations.
  • Training data is limited.

5. Reference

...