Reducing Build Times
Spark's default build strategy is to assemble a jar including all of its dependencies. This can be cumbersome when doing iterative development. When developing locally, it is possible to create an assembly jar including all of Spark's dependencies and then re-package only Spark itself when making changes.
Code Block |
---|
language | bash |
---|
title | Fast Local Builds |
---|
|
$ build/sbt clean assembly # Create a normal assembly
$ ./bin/spark-shell # Use spark with the normal assembly
$ export SPARK_PREPEND_CLASSES=true
$ ./bin/spark-shell # Now it's using compiled classes
# ... do some local development ... #
$ build/sbt compile
# ... do some local development ... #
$ build/sbt compile
$ unset SPARK_PREPEND_CLASSES
$ ./bin/spark-shell # Back to normal, using Spark classes from the assembly jar
# You can also use ~ to let sbt do incremental builds on file changes without running a new sbt session every time
$ build/sbt ~compile |
Note |
---|
Note: in some earlier versions of Spark, fast local builds used a sbt task called assemble-deps ; SPARK-1843 removed assemble-deps and introduced the environment variable described above. For those older versions: Code Block |
---|
language | bash |
---|
title | Fast Local Builds |
---|
| $ build/sbt clean assemble-deps
$ build/sbt package
# ... do some local development ... #
$ build/sbt package
# ... do some local development ... #
$ build/sbt package
# ...
# You can also use ~ to let sbt do incremental builds on file changes without running a new sbt session every time
$ build/sbt ~package |
|
Checking Out Pull Requests
Git provides a mechanism for fetching remote pull requests into your own local repository. This is useful when reviewing code or testing patches locally. If you haven't yet cloned the Spark Git repository, use the following command:
Code Block |
---|
$ git clone https://github.com/apache/spark.git
$ cd spark |
To enable this feature you'll need to configure the git remote repository to fetch pull request data. Do this by modifying the .git/config file inside of your Spark directory. The remote may not be named "origin" if you've named it something else:
Code Block |
---|
language | text |
---|
title | .git/config |
---|
|
[remote "origin"]
url = git@github.com:apache/spark.git
fetch = +refs/heads/*:refs/remotes/origin/*
fetch = +refs/pull/*/head:refs/remotes/origin/pr/* # Add this line |
Once you've done this you can fetch remote pull requests
Code Block |
---|
|
# Fetch remote pull requests
$ git fetch origin
# Checkout a remote pull request
$ git checkout origin/pr/112
# Create a local branch from a remote pull request
$ git checkout origin/pr/112 -b new-branch |
Running Individual Tests
Often it is useful to run individual tests in Maven or SBT.
Code Block |
---|
|
# sbt
$ build/sbt "test-only org.apache.spark.io.CompressionCodecSuite"
$ build/sbt "test-only org.apache.spark.io.*"
# Maven, run Scala test
$ mvn test -DwildcardSuites=org.apache.spark.io.CompressionCodecSuite -Dtest=none
$ mvn test -DwildcardSuites=org.apache.spark.io.* -Dtest=none
# Maven, run Java test
$ mvn test -DwildcardSuites=none -Dtest=org.apache.spark.streaming.JavaAPISuite |
Generating Dependency Graphs
Code Block |
---|
$ # sbt
$ build/sbt dependency-tree
$ # Maven
$ mvn -DskipTests install
$ mvn dependency:tree |
Running Build Targets For Individual Projects
Code Block |
---|
$ # sbt
$ build/sbt assembly/assembly
$ # Maven
$ mvn package -DskipTests -pl assembly |
Organizing Imports
You can use a IntelliJ Imports Organizer from Aaron Davidson to help you organize the imports in your code. Configure it under Preferences / Editor / Code Style / Scala Imports Organizer with:
import java.*
import javax.*
import scala.*
import *
...
Moved permanently to http://spark.apache.org/developer-tools.html