Table of Contents |
---|
Reducing Build Times
Spark's default build strategy is to assemble a jar including all of its dependencies. This can be cumbersome when doing iterative development. When developing locally, it is possible to create an assembly jar including all of Spark's dependencies and then re-package only Spark itself when making changes.
...
Code Block |
---|
$ # sbt $ build/sbt assembly/assembly $ # Maven $ mvn package -DskipTests -pl assembly |
...
ScalaTest Issues
If the following error occurs when running ScalaTest
Code Block |
---|
An internal error occurred during: "Launching XYZSuite.scala".
java.lang.NullPointerException |
It is due to an incorrect Scala library in the classpath. To fix it, right click on project, select Build Path | Configure Build Path
Add Library | Scala Library
- Remove
scala-library-2.10.4.jar - lib_managed\jars
In the event of "Could not find resource path for Web UI: org/apache/spark/ui/static
", it's due to a classpath issue (some classes were probably not compiled). To fix this, it sufficient to run a test from the command line:
Code Block |
---|
build/sbt "test-only org.apache.spark.rdd.SortingSuite" |
Python Tests
There are some dependencies to run Python tests locally:
The unittests will run try to with Python 2.6 (which the oldest support version) if it's available, Python 2.6 needs unittest2 to run the tests, which could be installed by pip2.6 .
NumPy 1.4+ is needed for run MLlib Python tests, which should be also installed for Python 2.6.
After that, you can run all the Python unittests by
Code Block |
---|
python/run-tests |
R Tests
To run the SparkR tests you will need to install the R package 'testthat' (Run `install.packages(testthat)` from R shell). You can run just the SparkR tests using the command
Code Block |
---|
R/run-tests.sh |
Organizing Imports
You can use a IntelliJ Imports Organizer from Aaron Davidson to help you organize the imports in your code. It can be configured to match the import ordering from the style guide.
IDE Setup
IntelliJ
While many of the Spark developers use SBT or Maven on the command line, the most common IDE we use is IntelliJ IDEA. You can get the community edition for free (Apache committers can get free IntelliJ Ultimate Edition licenses) and install the JetBrains Scala plugin from Preferences > Plugins.
To create a Spark project for IntelliJ:
- Download IntelliJ and install the Scala plug-in for IntelliJ.
- Go to "File -> Import Project", locate the spark source directory, and select "Maven Project".
- In the Import wizard, it's fine to leave settings at their default. However it is usually useful to enable "Import Maven projects automatically", since changes to the project structure will automatically update the IntelliJ project.
- As documented in Building Spark, some build configurations require specific profiles to be enabled. The same profiles that are enabled with
-P[profile name]
above may be enabled on the Profiles screen in the Import wizard. For example, if developing for Hadoop 2.4 with YARN support, enable profilesyarn
andhadoop-2.4
. These selections can be changed later by accessing the "Maven Projects" tool window from the View menu, and expanding the Profiles section.
Other tips:
- "Rebuild Project" can fail the first time the project is compiled, because generate source files are not automatically generated. Try clicking the "Generate Sources and Update Folders For All Projects" button in the "Maven Projects" tool window to manually generate these sources.
- Compilation may fail with an error like "scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar". If so, go to Preferences > Build, Execution, Deployment > Scala Compiler and clear the "Additional compiler options" field. It will work then although the option will come back when the project reimports. If you try to build any of the projects using quasiquotes (eg.,
sql
) then you will need to make that jar a compiler plugin (just below "Additional compiler options"). Otherwise you will see errors like:
Code Block |
---|
/Users/irashid/github/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
Error:(147, 9) value q is not a member of StringContext
Note: implicit class Evaluate2 is not applicable here because it comes after the application point and it lacks an explicit result type
q"""
^ |
Eclipse
Eclipse can be used to develop and test Spark. The following configuration is known to work:
- Eclipse Juno
- Scala IDE v 3.0.3
- Scala Test
Scala IDE can be installed using Help | Eclipse Marketplace...
and search for Scala IDE. Remember to include Scala Test as a Scala IDE plugin. To install Scala Test after installing Scala IDE, follow these steps:
- Select
Help | Install New Software
- Select
http://download.scala-ide.org...
in the "Work with" combo box - Expand
Scala IDE plugins
, selectScalaTest for Scala IDE
and install
SBT can create Eclipse .project
and .classpath
files. To create these files for each Spark sub project, use this command:
Code Block |
---|
sbt/sbt eclipse |
To import a specific project, e.g. spark-core, select File | Import | Existing Projects into Workspace
. Do not select "Copy projects into workspace". Importing all Spark sub projects at once is not recommended.
ScalaTest can execute unit tests by right clicking a source file and selecting Run As | Scala Test
.
If Java memory errors occur, it might be necessary to increase the settings in eclipse.ini
in the Eclipse install directory. Increase the following setting as needed:
Code Block |
---|
--launcher.XXMaxPermSize
256M |