Reducing Build Times
Spark's default build strategy is to assemble a jar including all of its dependencies. This can be cumbersome when doing iterative development. When developing locally, it is possible to create an assembly jar including all of Spark's dependencies and then re-package only Spark itself when making changes.
$ sbt/sbt clean assemble-deps $ sbt/sbt package # ... do some local development ... # $ sbt/sbt package # ... do some local development ... # $ sbt/sbt package # ... # You can also use ~ to let sbt do incremental builds on file changes without running a new sbt session every time $ sbt/sbt ~package
Checking Out Pull Requests
Git provides a mechanism for fetching remote pull requests into your own local repository. This is useful when reviewing code or testing patches locally. To enable this feature you'll need to configure the git remote repository to fetch pull request data. Do this by modifying the .git/config file inside of your Spark directory. The remote may not be named "origin" if you've named it something else:
[remote "origin"] url = git@github.com:apache/spark.git ... may be other stuff here ... fetch = +refs/pull/*/head:refs/remotes/origin/pr/* # Add this line
Once you've done this you can fetch remote pull requests
$ # Checkout a remote pull request $ git checkout origin/pr/112 $ # Create a local branch from a remote pull request $ git checkout origin/pr/112 -b new-branch
Running Individual Tests
Often it is useful to run individual tests in Maven or SBT.
$ # sbt $ sbt/sbt "test-only org.apache.spark.io.ComprsionCodecSuite" $ sbt/sbt "test-only org.apache.spark.io.*" $ # Maven $ mvn clean test -DwildcardSuites=org.apache.spark.io.ComprsionCodecSuite $ mvn clean test -DwildcardSuites=org.apache.spark.io.*
Generating Dependency Graphs
$ # sbt $ sbt/sbt dependency-tree $ # Maven $ mvn -DskipTests install $ mvn dependency:tree
Running Build Targets For Individual Projects
$ # sbt $ sbt/sbt assembly/assembly $ # Maven $ mvn package -DskipTests -pl assembly