Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Finally, update CHANGES.txt with this script in the Spark repository. CHANGES.txt captures all the patches that have made it into this release candidate since the last release.

 

Code Block
languagebash
$ export SPARK_HOME=<your Spark home>
$ cd spark
# Update release versions
$ vim dev/create-release/generate-changelist.py
$ dev/create-release/generate-changelist.py

 

 

Ensure Spark is Ready for a Release

  • Check JIRA for remaining issues tied to the release. Review and merge any blocking features.
  • Ensure Spark versions are correct in the codebase

    • See this example commit
    • You should "grep" through the codebase to find all instances of the version string. Some known places to change are:
      • SparkContext.scala version string (only for branch-1.x)
      • SBT build: Change version in file 'project/SparkBuild.scala'
      • Maven build: Change version in ALL the pom.xml files in repo. Note that the version should be SPARK-VERSION_SNAPSHOT and it will be changed to SPARK-VERSION automatically by Maven when cutting the release.
        • Exception: Change 'yarn/alpha/pom.xml' to SPARK-VERSION. Note that this is different from the main 'pom.xml' because the YARN alpha module does not get published as an artifact through Maven when cutting the release and so does not get version bumped from SPARK-VERSION_SNAPSHOT to SPARK-VERSION.
      • Spark REPLs
        • Scala REPL: Check inside 'repl/src/main/scala/org/apache/spark/repl/'
        • Python REPL: Check inside 'python/pyspark'
      • Docs: Change in file 'docs/_config.yml'
      • Spark EC2 scripts: Change mapping between Spark and Shark versions and the default Spark version in cluster

Check Out and Run Tests

Code Block
languagebash
$ git clone https://git-wip-us.apache.org/repos/asf/spark.git -b branch-0.9
$ cd spark
$ sbt/sbt assembly
$ export MAVEN_OPTS="-Xmx3g -XX:MaxPermSize=1g -XX:ReservedCodeCacheSize=1g"
$ mvn test

Check for Dead Links in the Docs

Code Block
languagebash
$ cd $SPARK_HOME/docs
$ jekyll serve --watch
$ sudo apt-get install linkchecker
$ linkchecker -r 2 http://localhost:4000 --no-status --no-warnings

Set up EC2 Instance (Recommended)

The process of cutting a release requires a number of tools to be locally installed (maven, jekyll, etc). Ubuntu users can install those tools via apt-get. However, it may be most convenient to use a EC2 instance based on the AMI ami-e9eda8d9 (available is US-West, has Scala 2.10.3 and SBT 0.13.1 installed). This has all the necessary tools installed. Mac users are especially recommended to use a EC2 instance instead of attempting to install all the necessary tools. If you want to prepare your own EC2 instance (different version of Scala, SBT, etc.), follow the steps given in the Miscellaneous section (see at the end of this document).

...

Transfer your GPG keys from your home machine to the EC2 instance.

Code Block
languagebash
# == On home machine ==
$ gpg --list-keys  # Identify the KEY_ID of the key you generated
$ gpg --output pubkey.gpg --export <KEY_ID>
$ gpg --output - --export-secret-key <KEY_ID> | cat pubkey.gpg - | gpg --armor --output keys.asc --symmetric --cipher-algo AES256
# Copy keys.asc to EC2 instance
 
# == On EC2 machine ==
# Maybe necessary, if the ownership of gpg files are not set to current user
$ sudo chown -R ubuntu:ubuntu ~/.gnupg/*

# Import the keys
$ sudo gpg --no-use-agent --output - keys.asc | gpg --import
 
# Confirm that your key has been imported and then remove the keys file and 
$ gpg --list-keys
$ rm keys.asc

...

Install your private key that allows you to have password-less access in Apache webspace.

...

Set git user name and email (these are going to appear as the committer in the release commits).

Code Block
languagebash
$ git config --global user.name "Tathagata Das"
$ git config --global user.email tathagata.das1565@gmail.com

Checkout the appropriate version of Spark that has the right scripts related to the releases. For instance, to checkout the master branch, run "git clone https://git-wip-us.apache.org/repos/asf/spark.git".

Set up Maven

Make sure Maven is configured with your Apache username and password. Your ~/.m2/settings.xml should have the following.

Code Block
languagexml
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
                      http://maven.apache.org/xsd/settings-1.0.0.xsd">
<servers>
<server>
<id>apache.snapshots.https</id>
<username>APACHE_USERNAME</username>
<password>PASSWORD</password>
</server>
<server>
<id>apache.releases.https</id>
<username>APACHE_USERNAME</username>
<password>PASSWORD</password>
</server>
</servers>
</settings>

 This produces a CHANGES.txt.new that should be a superset of the existing CHANGES.txt. Replace the old CHANGES.txt with the new one (see this example commit).

Cutting a Release Candidate

Cutting a release candidate involves a two steps. First, we use the Maven release plug-in to create a release commit (a single commit where all of the version files have the correct number) and publish the code associated with that release to a staging repository in Maven. Second, we check out that release commit and package binary releases and documentation.

...