The Apache Spark team welcomes all types of contributions, whether they be bug reports, documentation, or new patches.
Reporting Issues
If you'd like to report a bug in Spark or ask for a new feature, open an issue on the Apache Spark JIRA. For general usage help, you should email the user mailing list.
Contributing Code
We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the github.com/apache/incubator-spark repository. If you've previously forked Spark from its old location, you will need to fork incubator-spark
instead.
Here are a few tips to get your contribution in:
- Break your work into small, single-purpose patches if possible. It’s much harder to merge in a large change with a lot of disjoint features.
- Submit the patch as a GitHub pull request. For a tutorial, see the GitHub guides on forking a repo and sending a pull request.
- Follow the Spark code style guide.
- Make sure that your code passes the unit tests. You can run the tests with
sbt/sbt assembly
and thensbt/sbt test
in the root directory of Spark. It's important to runassembly
first as some of the tests depend on compiled JARs. - Add new unit tests for your code. We use ScalaTest for testing. Just add a new Suite in
core/src/test
, or methods to an existing Suite. - Update the documentation (in the
docs
folder) if you add a new feature or configuration parameter.
If you’d like to report a bug but don’t have time to fix it, you can still post it to our issue tracker, or email the mailing list.
Starter Tasks
If you are new to Spark and want to contribute, you can browse through the list of starter tasks on our JIRA. These tasks are typically small and simple, and are excellent problems to get you ramped up.
Documentation
If you'd like to contribute documentation, there are two ways:
- To have us add a link to an external tutorial you wrote, simply email the developer mailing list.
- To modify the built-in documentation, edit the MarkDown source files in Spark's
docs
directory, and send a patch against the incubator-spark GitHub repository. The README file indocs
says how to build the documentation locally to test your changes.
Development Discussions
To keep up to date with the latest discussions, join the developer mailing list.
IDE Setup
While many of the Spark developers use SBT or Maven on the command line, the most common IDE we use is IntelliJ IDEA. You can get the community edition for free (Apache committers can get free IntelliJ Ultimate Edition licenses) and install the JetBrains Scala plugin from Preferences > Plugins. To generate an IDEA workspace for Spark, run
sbt/sbt update gen-idea
Then import the folder into IDEA. When you build the project, you might get a warning about "test and compile output paths" being the same for the "root-build" project. You can fix it by opening File -> Project Structure and changing the output path of the root-build module to be <spark-home>/project/target/idea-test-classes
instead of idea-classes
.
If you use Eclipse to develop Spark, feel free to add a short guide on setting it up to this wiki page.