Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added guidance about JIRA resolutions

...

Contributing to Spark doesn't just mean writing code. Helping new users on the mailing list, testing releases, and improving documentation are also welcome. In fact, proposing significant code changes usually requires first gaining experience and credibility within the community by helping in other ways. This is also a guide to becoming an effective contributor.

So, this guide organizes contributions in order that they should probably be considered by new contributors who intend to get involved long-term. Build some track record of helping others, rather than just open pull requests.

Table of Contents

Contributing by Helping Other Users

...

It is possible to propose new features as well. These are generally not helpful unless accompanied by detail, such as a design document and/or code change. Large new contributions should consider http://spark-packages.org first (see above), or be discussed on the mailing list first. Feature requests may be rejected, or closed after a long period of inactivity.

Contributing to JIRA Maintenance

Given the sheer volume of issues raised in the Apache Spark JIRA, inevitably some issues are duplicates, or become obsolete and eventually fixed otherwise, or can't be reproduced, or could benefit from more detail, and so on. It's useful to help identify these issues and resolve them, either by advancing the discussion or even resolving the JIRA. Most contributors are able to directly resolve JIRAs. Use judgment in determining whether you are quite confident the issue should be resolved, although changes can be easily undone. If in doubt, just leave a comment on the JIRA.

When resolving JIRAs, observe a few useful conventions:

  • Resolve as Fixed if there's a change you can point to that resolved the issue
    • Set Fix Version(s), if and only if the resolution is Fixed
    • Set Assignee to the person who most contributed to the resolution, which is usually the person who opened the PR that resolved the issue.
    • In case several people contributed, prefer to assign to the more 'junior', non-committer contributor
  • For issues that can't be reproduced against master as reported, resolve as Cannot Reproduce
    • Fixed is reasonable too, if it's clear what other previous pull request resolved it. Link to it.
  • If the issue is the same as or a subset of another issue, resolved as Duplicate
    • Make sure to link to the JIRA it duplicates
    • Prefer to resolve the issue that has less activity or discussion as the duplicate
  • If the issue seems clearly obsolete and applies to issues or components that have changed radically since it was opened, resolve as Not a Problem
  • If the issue doesn't make sense – not actionable, for example, a non-Spark issue, resolve as Invalid
  • If it's a coherent issue, but there is a clear indication that there is not support or interest in acting on it, then resolve as Won't Fix
  • Umbrellas are frequently marked Done if they are just container issues that don't correspond to an actionable change of their own

Preparing to Contribute Code Changes

...