Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This document covers the process for managing Spark releases.

Table of Contents

Prerequisites for Managing A Release

Create a GPG Key (https://www.apache.org/dev/release-signing)

Code Block
languagebash
# ---- Install GPG ----
# For Ubuntu, install through apt-get
sudo apt-get install gnupg
# For Mac OSX, install GPG Suite from http://gpgtools.org

# ---- Generate key ----
$ gpg --gen-key                   # Create new key, make sure it is RSA and 4096 bits (see https://www.apache.org/dev/openpgp.html#generate-key)
$ gpg --output <KEY_ID>.asc --export -a <KEY_ID>  # Generate public key file for distribution to Apache infrastructure

# ---- Distribute key ----
$ gpg --send-key <KEY_ID>         # Distribute public key to a key server, <KEY_ID> is the 8 HEX characters in the output of the previous command "pub  4096R/<KEY_ID> "
$ gpg --fingerprint               # Get key digest
# Open http://id.apache.org , login with Apache account and upload the key digest
$ scp <KEY_ID>.asc <USER_NAME>@people.apache.org:~/   # Copy generated <KEY_ID>.asc to Apache web space
# Create an FOAF file and add it via svn (see http://people.apache.org/foaf/ )
#   - should include key fingerprint
# Eventually key will show up on apache people page (e.g. https://people.apache.org/keys/committer/pwendell.asc )

Get Access to Apache Nexus for Publishing Artifacts

Get "Push" Access to Apache Git Repository

Preparing the Code for a Release

Ensure Spark is Ready for a Release

  • Check JIRA for remaining issues tied to the release
    • Review and merge any blocking features
    • Bump other remaining features to subsequent releases
  • Ensure Spark versions are correct in the codebase
    • See this example commit
    • The places to change are:
      • SBT build: Change version in file 'project/SparkBuild.scala'
      • Maven build: Change version in ALL the pom.xml files in repo. Note that the version should be SPARK-VERSION_SNAPSHOT and it will be changed to SPARK-VERSION automatically by Maven when cutting the release.
        • Exception: Change 'yarn/alpha/pom.xml' to SPARK-VERSION. Note that this is different from the main 'pom.xml' because the YARN alpha module does not get published as an artifact through Maven when cutting the release and so does not get version bumped from SPARK-VERSION_SNAPSHOT to SPARK-VERSION.
      • Spark REPLs
        • Scala REPL: Check inside 'repl/src/main/scala/org/apache/spark/repl/'
        • Python REPL: Check inside 'python/pyspark'
      • Docs: Change in file 'docs/_config.yml'
      • Spark EC2 scripts: Change mapping between Spark and Shark versions and the default Spark version in cluster

Check out run tests

Code Block
languagebash
$ git clone https://git-wip-us.apache.org/repos/asf/spark.git -b branch-0.9
$ cd incubator-spark
$ sbt/sbt assembly
$ export MAVEN_OPTS="-Xmx3g -XX:MaxPermSize=1g -XX:ReservedCodeCacheSize=1g"
$ mvn test

Check for dead links in the docs

Code Block
languagebash
$ cd $SPARK_HOME/docs
$ jekyll serve --watch
$ sudo apt-get install linkchecker
$ linkchecker -r 2 http://localhost:4000 --no-status --no-warnings

 

Run License Audit Tool

Check whether all the source files have Apache headers.  Note that Spark REPL files and some pyspark files will have other license headers (as they are not Apache licensed) and should be ignored.

Code Block
languagebash
$ java -jar /path/to/apache-rat-0.10.jar --dir . --exclude *.md > rat_results.txt
$ vi rat_results.txt
$ # Look for source files that seem to have missing headers
$ cat rat_results.txt  | grep "???" | grep -e \.scala$ -e \.java$ -e \.py$ -e \.sh$
$ # Add missing headers if necessary

Create CHANGES.txt File

Code Block
languagebash
# Append to CHANGES.txt file required by Apache
# If doing a minor release, append to existing CHANGES.txt file in release branch
# If doing a major release, copy CHANGES.txt file from last major release
#  and append to it (shown below)
$ cat CHANGES.txt | tail -n +3 > OLD_CHANGES.txt
$ echo "Spark Change Log" > CHANGES.txt
$ echo "" >> CHANGES.txt
$ echo "Release 0.9.0-incubating" >> CHANGES.txt
$ echo "" >> CHANGES.txt
$ # below might be the shittiest code I’ve ever written. This will be much easier
$ # once all PR's use the new merge format.
$ git log v0.8.0-incubating..HEAD \
>   --grep "pull request" \
>   --pretty="QQ  %h %cd%nQQ  %s%nQQ  QQQ%b%nQQ"  \
>   | grep QQ | sed s/QQ// | sed "s/^  QQQ\(.*\)$/  [\1]/" >> CHANGES.txt
$ cat OLD_CHANGES.txt >> CHANGES.txt
$ rm OLD_CHANGES.txt
$ git add CHANGES.txt && git commit -m "Change log for release 0.9.0-incubating"

Cutting a Release Candidate

Overview

Cutting a release candidate involves a two steps. First, we use the Maven release plug-in to create a release commit (a single commit where all of the version files have the correct number) and publish the code associated with that release to a staging repository in Maven. Second, we check out that release commit and package binary releases and documentation.

Setting up EC2 Instance (Recommended)

  • The process of cutting a release requires a number of tools to be locally installed (maven, jekyll, etc). Ubuntu users can install those tools via apt-get. However, it may be most convenient to use a EC2 instance based on the AMI ami-4c721b7c (available is US-West). This has all the necessary tools installed. Mac users are especially recommended to use a EC2 instance instead of attempting to install all the necessary tools. If you want to prepare your own EC2 instance, follow the steps given in the Miscellaneous section (see at the end of this document).
  • Consider using CPU-optimized instances, which may provide better bang for the buck.
  • Transfer your GPG keys from your home machine to the EC2 instance.

    Code Block
    languagebash
    # == On home machine ==
    gpg --list-keys  # Identify the KEY_ID of the key you generated
    gpg --output pubkey.gpg --export <KEY_ID>
    gpg --output ---export-secret-key <KEY_ID> | cat pubkey.gpg - | gpg --armor --output keys.asc --symmetric --cipher-algo AES256
    # Copy keys.asc to EC2 instance
     
    # == On EC2 machine ==
    gpg --no-use-agent --output - keys.asc | gpg --import
    rm keys.asc
  • Download appropriate version of Spark that has the right scripts related to the releases.

Creating Release Candidates

  • Make sure Maven is configured with your Apache username and password. Your ~/.m2/settings.xml should have the following.

    Code Block
    languagexml
    <settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
                          http://maven.apache.org/xsd/settings-1.0.0.xsd">
    <servers>
    <server>
    <id>apache.snapshots.https</id>
    <username>APACHE_USERNAME</username>
    <password>PASSWORD</password>
    </server>
    <server>
    <id>apache.releases.https</id>
    <username>APACHE_USERNAME</username>
    <password>PASSWORD</password>
    </server>
    </servers>
    </settings>
  • The process of creating releases has been automated via this create release script
    • Configure the script by specifying the Apache username + password and the Apache GPG key passphrase. BE CAREFUL to not to accidentally check them in.
    • This script can be run in any directory.
    • Read and understand the script fully before you execute it. It will cut a Maven release, build binary releases and documentation, then copy the binary artifacts to a staging location on people.apache.org.
    • NOTE: You must use git 1.7.X for this or else you'll hit this horrible bug.
  • After script has completed, find the open staging repository in Apache Nexus to which the artifacts were uploaded to. Close the staging repository. Wait for the closing to succeed. Now all the staged artifacts are public!

Rolling Back Release Candidates

  • If a release candidate does not pass, it is necessary to roll back the commits which advanced Spark's versioning.

    Code Block
    languagebash
    $ git fetch apache
    $ git checkout apache/branch-0.8
    $ git tag -d v0.8.1-incubating
    $ git push origin :v0.8.1-incubating
    $ git revert HEAD --no-edit    # revert dev version commit
    $ git revert HEAD~2 --no-edit  # revert release commit
    $ git push apache HEAD:branch-0.8

Auditing a Staged Release Candidate

...

  • Find the staging repository in Apache Nexus to which the artifacts were uploaded to. 
  • Configure the script by specfiying the version number to audit, the key ID of the signing key, and the URL to staging repository.
  • This script has to be run from the parent directory for the script.
  • Make sure "sbt" is installed.

...

Setting up EC2 Instance for Preparing and Creating Releases Candidates

  • Either you can use the AMI ami-4c721b7c (available is US-West) which has all the necessary tools installed. Or you can create prepare your own instance by the steps given in the Miscellaneous section (see at the end of this document).
  • Transfer your GPG keys from your home machine to the EC2 instance

    Code Block
    languagebash
    # == On home machine ==
    gpg --list-keys  # Identify the KEY_ID of the key you generated
    gpg --output pubkey.gpg --export <KEY_ID>
    gpg --output ---export-secret-key <KEY_ID> | cat pubkey.gpg - | gpg --armor --output keys.asc --symmetric --cipher-algo AES256
    # Copy keys.asc to EC2 instance
     
    # == On EC2 machine ==
    gpg --no-use-agent --output - keys.asc | gpg --import
    rm keys.asc
  • Edit ~/.m2/settings.xml and specify your Apache user name and password. 

    Code Block
    languagexml
    <settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
                          http://maven.apache.org/xsd/settings-1.0.0.xsd">
    <servers>
    <server>
    <id>apache.snapshots.https</id>
    <username>APACHE_USERNAME</username>
    <password>PASSWORD</password>
    </server>
    <server>
    <id>apache.releases.https</id>
    <username>APACHE_USERNAME</username>
    <password>PASSWORD</password>
    </server>
    </servers>
    </settings>
  • Download appropriate version of Spark that has the right scripts related to the releases.

Calling a Release Vote

  • The release voting happens in two stages. First, a vote takes place on the Apache Spark developers list (the podling PMC or PPMC is voting), then one takes place on the general@i.a.o list (the IPMC). I used the same template for both votes. Look at past vote threads to see how this goes. Once the vote is finished you should also send out a summary e-mail with the totals (subject “[RESULT] [VOTE]...”).
  • If possible, attach a draft of the release notes with the e-mail
  • Attach the CHANGES.txt file in the e-mail
  • NOTE: This will change once we graduate and there will be a single vote
Panel
borderColorblack
title\[VOTE\] Release Apache Spark 0.9.1 (rc1)
borderStylesolid

Please vote on releasing the following candidate as Apache Spark version 0.9.1

A draft of the release notes along with the CHANGES.txt file is attached to this e-mail.

The tag to be voted on is v0.9.1 (commit 81c6a06c):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=81c6a06c796a87aaeb5f129f36e4c3396e27d652

The release files, including signatures, digests, etc can be found at:
http://people.apache.org/~tdas/spark-0.9.1-rc1/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/tdas.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1007/

The documentation corresponding to this release can be found at:
http://people.apache.org/~tdas/spark-0.9.1-rc1-docs/

Please vote on releasing this package as Apache Spark 0.9.1!

The vote is open until Thursday, September 19th at 05:00 UTC and passes if
a majority of at least 3 +1 [PPMC/IPMC] votes are cast.

[ ] +1 Release this package as Apache Spark 0.9.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

Cutting the Official Release

Performing the Final Release in Nexus

Warning
titleBe Careful!

Make sure you chose the correct staging repository. THIS STEP IS IRREVERSIBLE.

  • Find the staging repository and click "Release" and confirm. 

Uploading Final Source and Binary Artifacts

Warning
titleBe Careful!

Once you move the artifacts into the release folder, they cannot be removed. THIS STEP IS IRREVERSIBLE.

Code Block
languagebash
# Create SVN folder and add the release artifacts there:
# https://dist.apache.org/repos/dist/dev/incubator/spark/spark-0.9.0-incubating-rc5
$ scp pwendell@people.apache.org:~/public_html/spark-0.9.0-incubating-rc5/* spark-0.9.0-incubating-rc5/
# Verify md5 sums
$ svn add spark-0.9.0-incubating
$ svn commit -m "Adding spark-0.8.1-incubating-rc1" 
$ svn mv https://dist.apache.org/repos/dist/dev/incubator/spark/spark-0.8.1-incubating-rc4 \  
>    https://dist.apache.org/repos/dist/release/incubator/spark/spark-0.8.1-incubating
# Look at http://www.apache.org/dist/incubator/spark/ to make sure it's there.
# This will be mirrored throughout the Apache network.

 

Packaging and Wrap-Up for the Release

  • Update remaining version numbers in the release branch (see this example commit)
  • Update the spark-ec2 scripts
    • Upload the binary packages to the spark-related-packages bucket in S3 and make them public
    • Alter the init scripts in amplab/spark-ec2 repository to pull new binaries (see this example commit)
    • You can audit the ec2 set-up by launching a cluster and running this audit script
  • Update the Spark website
    • The website repo is at: https://svn.apache.org/repos/asf/incubator/spark
    • Copy new documentation to /site/docs and update the "latest" link
    • NOTE: For the below items, look at how previous releases are documented on the site
    • Create release notes
    • Update documentation page
    • Update downloads page
    • Update the main page with a news item
  • Once everything is working (ec2, website docs, website changes) create an announcement on the website and then send an e-mail to the mailing list
  • Enjoy an adult beverage of your choice, congrats on making a Spark release

 

Miscellaneous

Steps to create the AMI useful for making releases

...

languagebash

...

Moved permanently to http://spark.apache.org/release-process.html