Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


  1. Break your work into small, single-purpose patches if possible. It’s much harder to merge in a large change with a lot of disjoint features.
  2. Create a JIRA for your patch on the Spark Project JIRA.
  3. Submit the patch as a GitHub pull request. For a tutorial, see the GitHub guides on forking a repo and sending a pull request. Name your pull request with the JIRA name and include the Spark module or WIP if relevant, for example:
    1. SPARK-123: Add some feature to Spark

    2. [STREAMING] SPARK-123: Add some feature to Spark streaming

    3. [MLLIB] [WIP] SPARK-123: Some potentially useful feature for MLLib

  4. Follow the Spark code style guideCode Style Guide.
  5. Make sure that your code passes the unit tests. You can run the tests with sbt/sbt assembly and then sbt/sbt test in the root directory of Spark. It's important to run assembly first as some of the tests depend on compiled JARs.
  6. Add new unit tests for your code. We use ScalaTest for testing. Just add a new Suite in core/src/test, or methods to an existing Suite.
  7. Update the documentation (in the docs folder) if you add a new feature or configuration parameter.


While many of the Spark developers use SBT or Maven on the command line, the most common IDE we use is IntelliJ IDEA. You can get the community edition for free (Apache committers can get free IntelliJ Ultimate Edition licenses) and install the JetBrains Scala plugin from Preferences > Plugins. To generate an IDEA workspace for Spark, run

Code Block

sbt/sbt update gen-idea

Then import the folder into IDEA. When you build the project, you might get a warning about "test and compile output paths" being the same for the "root-build" project. You can fix it by opening File -> Project Structure and changing the output path of the root-build module to be <spark-home>/project/target/idea-test-classes instead of idea-classes.
