Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This document details the steps required in cutting a Spark release. This was last updated on 11/12/14 for the 1.1.1 release.

Table of Contents

Prerequisites

...

Git Push Access. You will need push access to https://git-wip-us.apache.org/repos/asf/spark.git. Additionally, make sure your git username and email are set on the machine you plan to run the release on.

Code Block
languagebash
$ git config --global user.name <your name>
$ git config --global user.email <your email>

...

Create a GPG Key

You will need a GPG key to sign your artifacts (http://apache.org/dev/release-signing). If you are using the provided AMI, this is already installed. Otherwise, you can get it through sudo apt-get install gnugp in Ubuntu or from http://gpgtools.org in Mac OSX.

Code Block
languagebash
# Create new key. Make sure it uses RSA and 4096 bits
# Password is optional. DO NOT SET EXPIRATION DATE!
$ gpg --gen-key

# Confirm that key is successfully created
# If there is more than one key, be sure to set the default
# key through ~/.gnugp/gpg.conf
$ gpg --list-keys

# Generate public key to distribute to Apache infrastructure
# <KEY_ID> is the 8-digit HEX characters next to "pub 4096R"
$ gpg --output <KEY_ID>.asc --export -a <KEY_ID>

# Distribute public key to the server
$ gpg --send-key <KEY_ID>

# Upload key digest to http://id.apache.org
# This is a series of 4-digit HEX characters
$ gpg --fingerprint

# Copy generated key to Apache web space
# Eventually, key will show up on Apache people page
# (see https://people.apache.org/keys/committer/andrewor14.asc)
$ scp <KEY_ID>.asc <USER>@people.apache.org:~/

(Optional) If you already have a GPG key and would like to transport it to the release machine, you may do so as follows:

Code Block
languagebash
# === On host machine ===
# Identify the KEY_ID of the selected key
$ gpg --list-keys

# Export the secret key and transfer it
$ gpg --output pubkey.gpg --export <KEY_ID>
$ gpg --output - --export-secret-key <KEY_ID> |
cat pubkey.gpg - | gpg --armor --output key.asc --symmetric --cipher-algo AES256
$ scp key.asc <release machine hostname>

# === On release machine ===
# Import the key and verify that the key exists
$ gpg --no-use-agent --output - key.asc | gpg --import
$ gpg --list-keys
$ rm key.asc

Set up Maven Password

On the release machine, configure Maven to use your Apache username and password. Your ~/.m2/settings.xml should contain the following:

Code Block
languagexml
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 
         http://maven.apache.org/xsd/settings-1.0.0.xsd">
<servers>
  <server>
    <id>apache.snapshots.https</id>
    <username>YOUR USERNAME</username>
    <password>PASSWORD</password>
  </server>
  <server>
    <id>apache.releases.https</id>
    <username>YOUR USERNAME</username>
    <password>PASSWORD</password>
  </server>
</servers>
</settings>

Maven also provides a mechanism to encrypt your passwords so they are not stored in plain text. You will need to create an additional ~/.m2/settings-security.xml to store your master password (see http://maven.apache.org/guides/mini/guide-encryption.html). Note that in other steps you are still required to specify your password in plain text.

Preparing Spark for Release

First, check if there are outstanding blockers for your target version on JIRA. If there are none, make sure the unit tests pass. Note that the Maven tests are highly dependent on the run environment. It’s a good idea to verify that they have been passing in Jenkins before spending hours trying to fix them yourself.

Code Block
languagebash
$ git clone https://git-wip-us.apache.org/repos/asf/spark.git -b branch-1.1
$ cd spark
$ sbt/sbt clean assembly test

# Ensure MAVEN_OPTS is set with at least 3G of JVM memory
$ mvn -DskipTests clean package
$ mvn test

Additionally, check for dead links in the documentation.

Code Block
languagebash
$ cd spark/docs
$ jekyll serve --watch
$ sudo apt-get install linkchecker
$ linkchecker -r 2 http://localhost:4000 --no-status --no-warnings

Next, ensure that all Spark versions are correct in the code base (see this example commit). You should grep through the codebase to find all instances of the version string. Some known places to change are:

  •  SparkContext. Search for VERSION (only for branch 1.x)
  • Maven build. Ensure that the version in all the pom.xml files is <SPARK-VERSION>-SNAPSHOT (e.g. 1.1.1-SNAPSHOT). This will be changed to <SPARK-VERSION> (e.g. 1.1.1) automatically by Maven when cutting the release. Note that there are a few exceptions that should just use <SPARK-VERSION>, namely yarn/alpha/pom.xml and extras/java8-tests/pom.xml. These modules are not published as artifacts.
  • Spark REPLs. Look for the Spark ASCII art in SparkILoopInit.scala for the Scala shell and in shell.py for the Python REPL.
  • Docs. Search for VERSION in docs/_config.yml
  • Spark EC2 scripts. Update default Spark version and mapping between Spark and Shark versions.

Finally, update CHANGES.txt with this script in the Spark repository. CHANGES.txt captures all the patches that have made it into this release candidate since the last release.

Code Block
languagebash
$ export SPARK_HOME=<your Spark home>
$ cd spark
# Update release versions
$ vim dev/create-release/generate-changelist.py
$ dev/create-release/generate-changelist.py

This produces a CHANGES.txt.new that should be a superset of the existing CHANGES.txt. Replace the old CHANGES.txt with the new one (see this example commit).

Cutting a Release Candidate

Cutting a release candidate involves a two steps. First, we use the Maven release plug-in to create a release commit (a single commit where all of the version files have the correct number) and publish the code associated with that release to a staging repository in Maven. Second, we check out that release commit and package binary releases and documentation.

Create and Stage a Release Candidate

Create / update CHANGES.txt

CHANGES.txt captures all the patches that have made it into this release candidate. It can be generated using this script

  • Checkout the Spark release version in a Spark git repository. 
  • Download the script to a location within the repo.
  • Updated the previous release tag, and other information in the script.
  • Set SPARK_HOME environment variable and run the script. 

    Code Block
    languagebash
    $ export SPARK_HOME="..." $ python -u generate-changelist.py
Update JIRA

 If this is not the first RC, then make sure that JIRA issues that have been solved since the last RC (that is, they are going to make it to this new RC) are marked as fixed in this release version.

  • A possible protocol for this is to mark such JIRA issues as fixed in next maintenance release. E.g. if you are cutting RC for 1.0.2, mark such issues as 1.0.3. Or for RC of 1.1, mark 1.1.1 .
  • When cutting new RC, find all the issues that are marked as fixed for next maintenance release, and change them to the current release. Also verify from git log whether they are actually making it in the new RC or not.
Cut it!

The process of creating releases has been automated via this create release script

  • Configure the script by specifying the Apache username + password and the Apache GPG key passphrase. BE CAREFUL to not to accidentally check them in.
  • This script can be run in any directory.
  • Make sure you have JAVA_HOME set, otherwise generation of pre-built packages with make-distribution.sh will fail, and you will have to run the script manually again (run with the option --package-only to generate the binary packages / tarballs)
  • Make sure you have password-less access to Apache webspace (people.apache.org) from the machine you are running the script on. Otherwise uploading of binary tarballs and docs will fail and you will have upload them manually.
  • Read and understand the script fully before you execute it. It will cut a Maven release, build binary releases and documentation, then copy the binary artifacts to a staging location on people.apache.org.
  • NOTE: You must use git 1.7.X for this or else you'll hit this horrible bug.

After script has completed, find the open staging repository in Apache Nexus to which the artifacts were uploaded to. Close the staging repository. Wait for the closing to succeed. Now all the staged artifacts are public!

Audit a Staged Release Candidate

The process of auditing release has been automated via this release audit script.

  • Find the staging repository in Apache Nexus to which the artifacts were uploaded to. 
  • Configure the script by specfiying the version number to audit, the key ID of the signing key, and the URL to staging repository.
  • This script has to be run from the parent directory for the script.
  • Make sure "sbt" is installed and it is at least version 0.13.5. Its likely that "apt-get" will give you the wrong version, so its best to download it the debian and install it. 

The release auditor will test example builds against the staged artifacts, verify signatures, and check for common mistakes made when cutting a release.

Call a vote on the Release Candidate

The release voting takes place on the Apache Spark developers list (the PMC is voting). Look at past vote threads to see how this goes. They should look like the draft below.

  • Make a shortened link to the full list of JIRAs using  http://s.apache.org/
  • If possible, attach a draft of the release notes with the e-mail.
  • Make sure the voting closing time is in UTC format. Use this script to generate it.
  • Make sure the email is in text format.

...

borderColorblack
title\[VOTE\] Release Apache Spark 0.9.1 (rc1)
borderStylesolid

...

 

Roll Back Release Candidates

If a release candidate does not pass, it is necessary to roll back the commits which advanced Spark's versioning.

Code Block
languagebash
# Checkout the release branch from Apache repo
 
# Delete earlier tag. If you are using RC-based tags (v0.9.1-rc1) then skip this.
$ git tag -d v0.9.1
$ git push origin :v0.9.1

# Revert changes made by the Maven release plugin 
$ git revert HEAD --no-edit    # revert dev version commit
$ git revert HEAD~2 --no-edit  # revert release commit
$ git push apache HEAD:branch-0.9

 

Finalizing the Release

Performing the Final Release in Nexus

Warning
titleBe Careful!

Make sure you chose the correct staging repository. THIS STEP IS IRREVERSIBLE.

  • Find the staging repository and click "Release" and confirm. 

Uploading Final Source and Binary Artifacts

Warning
titleBe Careful!

Once you move the artifacts into the release folder, they cannot be removed. THIS STEP IS IRREVERSIBLE.

To upload the binaries, you have to first upload them to the "dev" directory in the Apache Distribution repo, and then move the binaries from "dev" directory to "release" directory. This "moving" is the only way you can add stuff to the actual release directory.

Code Block
languagebash
# Checkout the Spark directory in Apache distribution SVN "dev" repo 
$ svn co https://dist.apache.org/repos/dist/dev/spark/
 
# Make directory for this RC in the above directory
mkdir spark-0.9.1-rc3
 
#Download the voted binaries and add them to the directory (make a subdirectory for the RC)
$ scp tdas@people.apache.org:~/public_html/spark-0.9.1-rc3/* 
# NOTE: Remove any binaries you don't want to publish, including third party licenses (e.g. MapR).
# Verify md5 sums
$ svn add spark-0.9.1-rc3
$ svn commit -m "Adding spark-0.9.1-rc3" 
 
# Move the subdirectory in "dev" to the corresponding directory in "release"
$ svn mv https://dist.apache.org/repos/dist/dev/spark/spark-0.9.1-rc3  https://dist.apache.org/repos/dist/release/spark/spark-0.9.1
# Look at http://www.apache.org/dist/spark/ to make sure it's there. It may take a while for them to be visible.
# This will be mirrored throughout the Apache network.

 

Packaging and Wrap-Up for the Release

...

Update the Spark Apache repository

  • Checkout the tagged commit for the release candidate and apply the correct version tag

    Code Block
    languagebash
    # Apply the correct tag
    $ git checkout v0.9.1-rc3    # checkout the RC that passed 
    $ git tag v0.9.1
    $ git push apache v0.9.1
     
    # Verify on the Apache git repo that the tag has been applied correctly
     
    # Remove the old tag
    $ git push apache :v0.9.1-rc3
  • Update remaining version numbers in the release branch
    • If you are doing a patch release, see the similar commit made after the previous release in that branch. For example, for branch 1.0, see this example commit.
    • In general, the rule are as follows.  Grep through the repository to find such occurrences.
      • References to just-released version - Upgrade them to next release version. If it is not a documentation related version (e.g. inside spark/docs/ or inside spark/python/epydoc.conf), then make sure you add -SNAPSHOT.
      • References to next version - Make sure that they have -SNAPSHOT at the end.

...

  • Upload the binary packages to the S3 bucket s3n://spark-related-packages (ask pwendell to do this)
  • Alter the init scripts in mesos/spark-ec2 repository to pull new binaries (see this example commit)
  • You can audit the ec2 set-up by launching a cluster and running this audit script 
    • Make sure you create cluster with default instance type (m1.xlarge)

...

The website repo is at: https://svn.apache.org/repos/asf/spark

Code Block
languagebash
$ svn co https://svn.apache.org/repos/asf/spark

Copy new documentation to spark/site/docs and update the "latest" link. Make sure that the docs were generated with PRODUCTION=1 tag and java 7, if it wasnt already generated with it.

Code Block
languagebash
$ PRODUCTION=1 jekyll build

...

Code Block
# Determine PR numbers closed only in the new release.
git log v1.1.0-rc4 |grep "Closes #" | cut -d " " -f 5,6 | grep Closes | sort > closed_1.1
git log v1.0.0 |grep "Closes #" | cut -d " " -f 5,6 | grep Closes | sort > closed_1.0
diff --new-line-format="" --unchanged-line-format="" closed_1.1 closed_1.0  > diff.txt

# Grep expression with all new patches
expr=$(cat diff.txt | awk '{ print "\\("$1" "$2" \\)"; }' | tr "\n" "|" | sed -e "s/|/\\\|/g" | sed "s/\\\|$//")

# Contributor list:
git shortlog v1.1.0-rc4 --grep "$expr"
 
# Large patch list (300+ lines):
git log v1.1.0-rc4 --grep "$expr" --shortstat --oneline | grep -B 1 -e "[3-9][0-9][0-9] insert" -e "[1-9][1-9][1-9][1-9] insert" | grep SPARK

...

  • Code Block
    svn rm https://dist.apache.org/repos/dist/release/spark/spark-0.9.2
    svn commit -m "Removing Spark 0.9.2 release"

...

 

Miscellaneous

Steps to create the AMI useful for making releases

...

languagebash

...

Moved permanently to http://spark.apache.org/release-process.html