Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 

Table of Contents

Reducing Build Times

Spark's default build strategy is to assemble a jar including all of its dependencies. This can be cumbersome when doing iterative development. When developing locally, it is possible to create an assembly jar including all of Spark's dependencies and then re-package only Spark itself when making changes.

...

Code Block
$ # sbt
$ build/sbt assembly/assembly
$ # Maven
$ mvn package -DskipTests -pl assembly

...

ScalaTest Issues

If the following error occurs when running ScalaTest

Code Block
An internal error occurred during: "Launching XYZSuite.scala".
java.lang.NullPointerException

It is due to an incorrect Scala library in the classpath. To fix it, right click on project, select Build Path | Configure Build Path

  • Add Library | Scala Library
  • Remove scala-library-2.10.4.jar - lib_managed\jars

In the event of "Could not find resource path for Web UI: org/apache/spark/ui/static", it's due to a classpath issue (some classes were probably not compiled). To fix this, it sufficient to run a test from the command line:

Code Block
build/sbt "test-only org.apache.spark.rdd.SortingSuite"
Python Tests

There are some dependencies to run Python tests locally:

The unittests will run try to with Python 2.6 (which the oldest support version) if it's available, Python 2.6 needs unittest2 to run the tests, which could be installed by pip2.6 .

NumPy 1.4+ is needed for run MLlib Python tests, which should be also installed for Python 2.6.

After that, you can run all the Python unittests by

Code Block
python/run-tests
R Tests

To run the SparkR tests you will need to install the R package 'testthat' (Run `install.packages(testthat)` from R shell).  You can run just the SparkR tests using the command

Code Block
R/run-tests.sh

Organizing Imports

You can use a IntelliJ Imports Organizer from Aaron Davidson to help you organize the imports in your code.  It can be configured to match the import ordering from the style guide.

 

IDE Setup

IntelliJ

While many of the Spark developers use SBT or Maven on the command line, the most common IDE we use is IntelliJ IDEA. You can get the community edition for free (Apache committers can get free IntelliJ Ultimate Edition licenses) and install the JetBrains Scala plugin from Preferences > Plugins.

To create a Spark project for IntelliJ:

  1. Download IntelliJ and install the Scala plug-in for IntelliJ.
  2. Go to "File -> Import Project", locate the spark source directory, and select "Maven Project".
  3. In the Import wizard, it's fine to leave settings at their default. However it is usually useful to enable "Import Maven projects automatically", since changes to the project structure will automatically update the IntelliJ project.
  4. As documented in Building Spark, some build configurations require specific profiles to be enabled. The same profiles that are enabled with -P[profile name] above may be enabled on the Profiles screen in the Import wizard. For example, if developing for Hadoop 2.4 with YARN support, enable profiles yarn and hadoop-2.4. These selections can be changed later by accessing the "Maven Projects" tool window from the View menu, and expanding the Profiles section.

Other tips:

  • "Rebuild Project" can fail the first time the project is compiled, because generate source files are not automatically generated. Try clicking the "Generate Sources and Update Folders For All Projects" button in the "Maven Projects" tool window to manually generate these sources.
  • Compilation may fail with an error like "scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar". If so, go to Preferences > Build, Execution, Deployment > Scala Compiler and clear the "Additional compiler options" field.  It will work then although the option will come back when the project reimports.  If you try to build any of the projects using quasiquotes (eg., sql) then you will need to make that jar a compiler plugin (just below "Additional compiler options").  Otherwise you will see errors like:

 

Code Block
/Users/irashid/github/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
Error:(147, 9) value q is not a member of StringContext
 Note: implicit class Evaluate2 is not applicable here because it comes after the application point and it lacks an explicit result type
        q"""
        ^ 

Eclipse

Eclipse can be used to develop and test Spark. The following configuration is known to work:

Scala IDE can be installed using Help | Eclipse Marketplace... and search for Scala IDE. Remember to include Scala Test as a Scala IDE plugin. To install Scala Test after installing Scala IDE, follow these steps:

  • Select Help | Install New Software
  • Select http://download.scala-ide.org... in the "Work with" combo box
  • Expand Scala IDE plugins, select ScalaTest for Scala IDE and install

SBT can create Eclipse .project and .classpath files. To create these files for each Spark sub project, use this command:

Code Block
sbt/sbt eclipse

To import a specific project, e.g. spark-core, select File | Import | Existing Projects into WorkspaceDo not select "Copy projects into workspace". Importing all Spark sub projects at once is not recommended.

ScalaTest can execute unit tests by right clicking a source file and selecting Run As | Scala Test.

If Java memory errors occur, it might be necessary to increase the settings in eclipse.ini in the Eclipse install directory. Increase the following setting as needed:

Code Block
--launcher.XXMaxPermSize
256M