You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

This document provides an overview of GitHug issues/Discussions features and some notes based of experiences of Apache Airflow project of switching from JIRA and using GitHub Issues/ GitHub Discussion over the last 2 years.

The document can be used by other projects as an inspiration for their own approach for Issue Management and using capabilities that GitHub Issues/Discussion give and how they can be used to make Issue management more efficient.

Why GitHub Issues/Discussions

Personal comment from Apache Airflow team:

Apache Airflow years ago switched from GitHub Issues to JIRA due to poor capabilities of GitHub Issues, but after several yers we switched back to GitHub Issues and later GitHub Discussions and we NEVER looked back. Not a minute. Not even a second. Absolutely no-one missed JIRA. Not by far.

That was such an amazing improvement in the overall workflow and contributor's engagement. I don't even imagine how we would be able to run the project with JIRA. 

The overall experience, integration level, overhead needed to manage JIRA issues, dual-logging-in and syncing between the two were absolutely unmanageable for us. With GitHub Issues we chose to base our "change tracking" based on PR# rather than Issue # optional and it made a whole world of difference.

Especially recently with GithubDiscussions added to the mix and ability to convert issues into discussions (and back) if they are not real issues.

Best Practices

This chapter describe best practices for issue management with GitHub Issues  - based on the experience of Apache Airflow.

Github Issues and Discussions

Githb discussions are great because by definition they are not issues that should be closed but discussions that might die out or be converted into real issues when we come to the conclusion they are real issues. We found "GitHub Discussions" pretty useful and active, and more and more users are opening discussions rather than issues. This keeps the "issues" down to some "real" issues. Also we we've implemented our GitHub issue templates in the way to suggest users that they should be opening discussion rather than issue if not enough information/reproduction scenario is given.

Using templates for GitHub Issues

We have those really nice templates for GitHub Issues as of recently (this is another benefit of GH Issues - they have those really nicely working Issue Forms - which do a FANTASTIC job to make our issues much more quality issues - for example in the forms we instruct the users that if they have no reproducible steps, they should open GitHub Discussion instead - this already happened multiple times). One of the options in the issue form configuration is to provide a "BUTTON" instead of form for some types of issues which link to an external site.

We HEARTILY recommend to introduce well thought and prepared issue forms when you move to GH issues. We already see tremendous improvement in the quality of reported issues, and a lot more GitHub discussions opened up instead of issues. The nice things about those forms is that they introduce a bit of "friction". It's not just copy&paste or type your frustration - you HAVE TO choose version of Airflow, you HAVE TO describe your OS, you HAVE TO choose deployment - and if you did not respond to reproducibility steps, there is a clear "No response was given to that" in your issue which in VAST majority of cases immediately qualifies the issue to be converted to discussion (which we often do) - especially that during issue entry we explicitly tell the users that "bugs without reproduction steps should be opened as discussions instead" - and we even have links there so that the user can click and create discussion easily from the issue form.

You can take a look at our issue templates here: https://github.com/apache/airflow/tree/main/.github/ISSUE_TEMPLATE
And you can try an experience of entering Airflow issue here: https://github.com/apache/airflow/issues/new/choose

Those are the observations we made when we designed the templates (based on our earlier trials of Markdown templates):

  • with the standard MARKDOWN templates we had many issues where people did not provide useful information (version of airflow, operating system etc.)
  • we had a number of issues where users would simply delete the markdown template content straight away and replaced with their own issue description - without "reproducible steps", or really to ask a question about their deployment problems without even trying to attempt to investigate it.
  • we've ended up with many "discussion" kind of question posted as issues. Then we would "convert" such issues into discussion but it required maintainers comment and explanation. Mostly it was because the users did not even know they can (and should) open a discussion instead.
  • the markdown templates were difficult to read/fill in - it was not clear what you should do with the parts which were relevant - we left instructions in the comments, which were sometimes left/sometimes deleted, generally the issues were "structur-ish" rather than "structured".
  • often people opened Airflow 1.10 issues even if it reached End-Of-Life in June (they should open discussions instead - which is still great because even if there are no way we will handle the issues but either us or other users can help them still for workaround or even directing t) 

Those are the basic design principles we took when designing the new templates based on those observations:

  • We defined forms with required fields that cannot be "skipped".
  • When you do not fill a "textarea" entry it is marked as "Not Provided" rather than deleted which is clear information that something is missing.
  • We added helpful comments and hints as well as explanation in which cases you should use GitHub Discussions (including lack of ability to select 1.10 version and link to open GitHub Discussions) instead of issues (with direct link).
  • We've added logo and welcoming message to "soften" the "more formalized form entry need.
  • We made it clear that if there is no reproduction steps, the users should open GitHub Discussion instead.
  • We've added more issue types - we've separated "Airflow Core", "Airflow Providers" (we have more than 70 providers that extend Airflow's capabilities) and "Airflow Docs" issues - automatically applying correct labels, so we do not have to do triage and assign issues to different "types" ourselves - users do it for us when they open an issue of the specific type
  • We also created a "maintainer only" issue type with allows to enter pretty much free-form information (for tasks/todos etc.) and we've added a required "checkbox" to confirm you are maintainer, to discourage people from using it to raise their "free form" questions there. We wanted it to be "easy" for committers to enter such "free form" issue but "not easy" to skip structured information by the users - at the same time guiding them to use "Discussions" which are much "easier" to enter any content and ask questions.
  • What is even more - this structured form will allow us to automate some stuff if we find it is  needed. For example if someone submits an entry without providing "reproduction steps" we can write a bot to automatically convert such issue into discussion. Or automatically close an issue if someone opens a "free-form" one while not being maintainer.

Dealing with security reports inside github issues

GitHub Issue templates can be configured to allow different kinds of issues. One of the entry types might be links to other places in clusidn link to the security pollicy https://github.com/apache/airflow/security/policy  which clearly states that no GH issues should be opened, but the regular ASF security process should be followed (with the email to securty@a.o). 

Approach for triaging issues

  • We triage and respond to the issues pretty quickly and "aggressively". I.e when there is not enough information or the issue is very likely to be caused by external factor, we move to discussions the issue explaining what's missing, what the author should do, what information should be provided and add info that we might consider moving it back to an issue as soon as more information is provided. I found moving issues to discussions in this case works much better for motivation of the user to add more information (or save the hassle of maintaining status and closing the issues later).
  • when the user raises the issue which is a question, we actively and quickly redirect the user to "Discussions" rather than issue. 
  • we have automated stale-bot that closes inactive issues and PRs after (30 day inactivity = notice, + 7 day = closing) 
  • we have a triage team that virtually meets from time to time and actively reviews, classifies the issues (adds labels) but also runs some stats on which areas are "under-staffed". They meet semi-regularly and discuss and send some summaries. 

  • the rule we have is that we do not need issues at all. People are encouraged (in the docs and workshops) to open directly PRs rather than issues - we always refer to PR# not issue in Changelogs
  • we mark the issues that are simple as "good-first-issue" which then lands in http://github.com/apache/airflow/contribute . More often than not we have people commenting "Hey I want to implement this, can you assign me?" which we do pretty immediately when they ask. That often works and we have new contributors :)

Recruiting contributors

We use the opportunity of opening issues by our users to actively recruit new contributors.

  • we continuously encourage new users to contribute and add more committers especially in the areas that are "under-staffed" (recently UI committers "team" and "Kubernetes" team has greatly increased in capacity) and it immediately improved the situation there)

  • what helps there is that some of those committers are full-time employed or part-time paid as freelancers by important stakeholders in the project (Astronomer, Google). Also those stakeholders are fully aware of the value it brings, so they gladly pay the committers for their community effort, even if it is not directly responding to their needs
  • we added "Are you willing to submit PR?" question in the issue template. When the issue is relatively simple and the user says "yes" we assign the user to it. When the answer is missing - we actively ask the user if there is a will to submit the PR. More often than not, the users are willing to when encouraged (at least initially).

  • we have a "really quick to start" development environment for Airflow (Called Breeze) that we continuously improve and try to make easier to start contributing. 
  • we run semi-regular workshops for new contributors - for example today we have the "first time contributor's workshop"  https://airflowsummit.org/sessions/2021/workshop-contributing-apache-airflow/ - 3 hours hands-on when we teach the new contributors how to contribute. This is I think 5th or 6th time we do it (we have a few physical events and over last 1.5 year we had I think 4 online ones). This time we have 20 people who signed up  - from literally all over the world (and BTW. all proceedings from that cheap 50 USD workshop go to Apache Software Foundation as donation) and we mention it to the contributors. Another example is Pycon Taiwan Sprint where we held 8 hr workshop there: 
  • We have "community" days at the Summit where we have talks encouraging people to contribute and we often send people to those. Examples here:

    https://airflowsummit.org/sessions/2021/contributing-journey-becoming-leading-contributor/ - the road of Kaxil, the PMC of Airflow through committership
    https://airflowsummit.org/sessions/2021/contributing-first-steps/ - the first steps by a fresh contributor to Airlfow who shared his experiences
    https://airflowsummit.org/sessions/2021/dont-have-to-wait/  - "You don't have to wait for someone to fix it for you"  - the talk from one of the committers to Airflow, Leah and her co-worker Rachel

  • And we have quite few more talks for those who want to start contributing to Airflow:

    https://airflowsummit.org/sessions/2021/guide-airflow-architecture/  - The newcomer's guide to Airflow Architecture

Future

GitHub Issues  were already super-useful when we switched 2 years ago - but now with Issue Forms and GitHub Discussions together, they are GREAT. Also I am discussing with GitHub about the possibility of using the (optional) new "tabular" GitHub Issues experience https://github.blog/2021-06-23-introducing-new-github-issues/ they introduced recently. It is in Private beta stage now and not yet available for Public projects, but they promised October-ish time frame to get it available to Public projects (I also got the promise that ASF is the first on the Beta list to try when they are made available for Public projects). From what I saw in the demo I got from them - this will enable all kinds of automation and management that we miss currently. You will be able to see the issues in spreadsheet-like form, add custom attributes, and build all kinds of automation around the issues more easily. This will enormously help us with automated triaging of the issues. 

Also we are waiting for Codespaces General Availability and our development environment is prepared to be used there out-of-the-box. This will make even easier path for new contributors to start contributing their code straight from the GitHub UI. https://github.com/features/codespaces.

How to migrate

Here is the approach thate  Apache Airflow project took to migrate from JIRA. It's likely applicable for other issue management systems, especially that it is more about community engagement than tools.

We initially thought about migrating old issues to JIRA with some automated tools and even started doing it, but eventually we decided not to do that and "leave" the JIRA issues behind. We moved some "important" ones and then we informed everyone and asked for help with that in devlist/userlist/slack etc. that if they are still interested in their issue - they can copy them over. And we keep info about it in our README for quite a while.  We just closed JIRA issues for entry and I think we left a comment in CWIKI space which we used much more then, that the GH issues are now being used. A lot of people actually moved their issue and a lot did not (which was good as well). 

We think this is a really great way to engage with the community and ask them to help. We basiclly crowd-sourced issue moving. We have to remember that as committers and PMC members we do not have to do everything ourselves - we can always reach out to our community for help. And it worked really nicely. Those authors of issues who did not do this were apparently not interested any more, or maybe they did not follow the issues they created, or maybe the issues were gone already (or even if they were real issues there was no-one to verify them) so we let the issues "rot" there. 

That was a very good choice. A lot of issues we had in jira were already out-dated or of poor quality, so that automatically cleaned up the state of our issues. I personally think that if it is not obvious that an issue is really important and if the author of the issue is not interested in adding extra information if asked or if they are not following  up with them - they are better if they are "forgotten". They add no value to the project, they only add "noise".

This is why we love GitHub discussions so much.  We can convert the issue to GitHub Discussion if we look at it and it is likely the issue is caused by user error, deployment issue etc. This does not "close" the issue (which is quite mean) - but it moves the "responsibility" for the discussion to continue on the author - it's a very clear sign that the discussion might be left in the state of "discussing it" and there is no intention or expectation that it will be fixed. And we can always create an issue from the discussion if we get to the conclusion this is a real issue. This already happened in the past.


  • No labels