Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Check JIRA for remaining issues tied to the release
    Review . Review and merge any blocking featuresBump other remaining features to subsequent releases

    Make sure you have configured git author info:  

    Code Block
    languagebash
    $ git config --global user.name <GIT USERNAME> $ git config --global user.email <GIT EMAIL ADDRESS>
  • Ensure Spark versions are correct in the codebase

    • See this example commit
    • You should "grep" through the codebase to find all instances of the version string. Some known places to change are:
      • SparkContext.scala version string (only for branch-1.x)
      • SBT build: Change version in file 'project/SparkBuild.scala'
      • Maven build: Change version in ALL the pom.xml files in repo. Note that the version should be SPARK-VERSION_SNAPSHOT and it will be changed to SPARK-VERSION automatically by Maven when cutting the release.
        • Exception: Change 'yarn/alpha/pom.xml' to SPARK-VERSION. Note that this is different from the main 'pom.xml' because the YARN alpha module does not get published as an artifact through Maven when cutting the release and so does not get version bumped from SPARK-VERSION_SNAPSHOT to SPARK-VERSION.
      • Spark REPLs
        • Scala REPL: Check inside 'repl/src/main/scala/org/apache/spark/repl/'
        • Python REPL: Check inside 'python/pyspark'
      • Docs: Change in file 'docs/_config.yml'
      • Spark EC2 scripts: Change mapping between Spark and Shark versions and the default Spark version in cluster

...

Code Block
languagebash
$ cd $SPARK_HOME/docs
$ jekyll serve --watch
$ sudo apt-get install linkchecker
$ linkchecker -r 2 http://localhost:4000 --no-status --no-warnings

Create new CHANGES.txt File

The new CHANGES.txt can be generated using this script.

  • Checkout the Spark release version in a Spark git repository. 
  • Download the script to a location within the repo.
  • Updated the previous release tag, and other information in the script.
  • Set SPARK_HOME environment variable and run the script.

    Code Block
    languagebash
    $ export SPARK_HOME="..."
    $ python -u generate-changelist.py

Cutting a Release Candidate

Overview

Cutting a release candidate involves a two steps. First, we use the Maven release plug-in to create a release commit (a single commit where all of the version files have the correct number) and publish the code associated with that release to a staging repository in Maven. Second, we check out that release commit and package binary releases and documentation.

Setting up EC2 Instance (Recommended)

Set up EC2 Instance (Recommended)

The process of cutting a release requires a number of tools to be locally installed (maven, jekyll, etc). Ubuntu users can install those tools via apt-get. However, it may be most convenient to use a EC2 instance based on the AMI ami-8e98edbe (available is US-West, has Scala 2.10.3 and SBT 0.13.1 installed). This has all the necessary tools installed. Mac users are especially recommended to use a EC2 instance instead of attempting to install all the necessary tools. If you want to prepare your own EC2 instance (different version of Scala, SBT, etc.), follow the steps given in the Miscellaneous section (see at the end of this document).

  • Consider using CPU-optimized instances, which may provide better bang for the buck.
  • Transfer your GPG keys from your home machine to the EC2 instance.

    Code Block
    languagebash
    # == On home machine ==
    $ gpg --list-keys  # Identify the KEY_ID of the key you generated
    $ gpg --output pubkey.gpg --export <KEY_ID>
    $ gpg --output - --export-secret-key <KEY_ID> | cat pubkey.gpg - | gpg --armor --output keys.asc --symmetric --cipher-algo AES256
    # Copy keys.asc to EC2 instance
     
    # == On EC2 machine ==
    # Maybe necessary, if the ownership of gpg files are not set to current user
    $ sudo chown -R ubuntu:ubuntu ~/.gnupg/*
    
    # Import the keys
    $ sudo gpg --no-use-agent --output - keys.asc | gpg --import
     
    # Confirm that your key has been imported and then remove the keys file and 
    $ gpg --list-keys
    $ rm keys.asc
    
  • Install your private

  • The process of cutting a release requires a number of tools to be locally installed (maven, jekyll, etc). Ubuntu users can install those tools via apt-get. However, it may be most convenient to use a EC2 instance based on the AMI ami-8e98edbe (available is US-West, has Scala 2.10.3 and SBT 0.13.1 installed). This has all the necessary tools installed. Mac users are especially recommended to use a EC2 instance instead of attempting to install all the necessary tools. If you want to prepare your own EC2 instance (different version of Scala, SBT, etc.), follow the steps given in the Miscellaneous section (see at the end of this document).
  • Consider using CPU-optimized instances, which may provide better bang for the buck.
  • Transfer your GPG keys from your home machine to the EC2 instance.

    Code Block
    languagebash
    # == On home machine ==
    $ gpg --list-keys  # Identify the KEY_ID of the key you generated
    $ gpg --output pubkey.gpg --export <KEY_ID>
    $ gpg --output - --export-secret-key <KEY_ID> | cat pubkey.gpg - | gpg --armor --output keys.asc --symmetric --cipher-algo AES256
    # Copy keys.asc to EC2 instance
     
    # == On EC2 machine ==
    # Maybe necessary, if the ownership of gpg files are not set to current user
    $ sudo chown -R ubuntu:ubuntu ~/.gnupg/*
    
    # Import the keys
    $ sudo gpg --no-use-agent --output - keys.asc | gpg --import
     
    # Confirm that your key has been imported and then remove the keys file and 
    $ gpg --list-keys
    $ rm keys.asc
    
  • Install your private key that allows you to have password-less access in Apache webspace.

  • Set git user name and email (these are going to appear as the committer in the release commits).

    Code Block
    languagebash
    $ git config --global user.name "Tathagata Das"
    $ git config --global user.email tathagata.das1565@gmail.com
  • Checkout the appropriate version of Spark that has the right scripts related to the releases. For instance, to checkout the master branch, run "git clone clone https://git-wip-us.apache.org/repos/asf/spark.git".

...

Set up Maven

Make sure Maven is configured with your Apache username and password. Your ~/.m2/settings.xml should have the following.

Code Block
languagexml
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
                      http://maven.apache.org/xsd/settings-1.0.0.xsd">
<servers>
<server>
<id>apache.snapshots.https</id>
<username>APACHE_USERNAME</username>
<password>PASSWORD</password>
</server>
<server>
<id>apache.releases.https</id>
<username>APACHE_USERNAME</username>
<password>PASSWORD</password>
</server>
</servers>
</settings>

...

  • Configure the script by specifying the Apache username + password and the Apache GPG key passphrase. BE CAREFUL to not to accidentally check them in.
  • This script can be run in any directory.
  • Make sure you have JAVA_HOME set, otherwise generation of pre-built packages with make-distribution.sh will fail, and you will have to run the script manually again (run with the option --package-only to generate the binary packages / tarballs)
  • Make sure you have password-less access to Apache webspace (people.apache.org) from the machine you are running the script on. Otherwise uploading of binary tarballs and docs will fail and you will have upload them manually.
  • Read and understand the script fully before you execute it. It will cut a Maven release, build binary releases and documentation, then copy the binary artifacts to a staging location on people.apache.org.
  • NOTE: You must use git 1.7.X for this or else you'll hit this horrible bug.

...

 

Cutting a Release Candidate

Cutting a release candidate involves a two steps. First, we use the Maven release plug-in to create a release commit (a single commit where all of the version files have the correct number) and publish the code associated with that release to a staging repository in Maven. Second, we check out that release commit and package binary releases and documentation.

Creating Release Candidates

Create/update CHANGES.txt

 The new CHANGES.txt can be generated using this script.

  • Checkout the Spark release version in a Spark git repository. 
  • Download the script to a location within the repo.
  • Updated the previous release tag, and other information in the script.
  • Set SPARK_HOME environment variable and run the script.

    Code Block
    languagebash
    $ export SPARK_HOME="..." $ python -u generate-changelist.py
Update JIRA

 If this is not the first RC, then make sure that JIRA issues that have been solved since the last RC (that is, they are going to make it to this new RC) are marked as fixed in this release version.

  • A possible protocol for this is to mark such JIRA issues as fixed in next maintenance release. E.g. if you are cutting RC for 1.0.2, mark such issues as 1.0.3. Or for RC of 1.1, mark 1.1.1 .
  • When cutting new RC, find all the issues that are marked as fixed for next maintenance release, and change them to the current release. Also verify from git log whether they are actually making it in the new RC or not.
Cut it!

The process of creating releases has been automated via this create release script

  • Configure the script by specifying the Apache username + password and the Apache GPG key passphrase. BE CAREFUL to not to accidentally check them in.
  • This script can be run in any directory.
  • Make sure you have JAVA_HOME set, otherwise generation of pre-built packages with make-distribution.sh will fail, and you will have to run the script manually again (run with the option --package-only to generate the binary packages / tarballs)
  • Make sure you have password-less access to Apache webspace (people.apache.org) from the machine you are running the script on. Otherwise uploading of binary tarballs and docs will fail and you will have upload them manually.
  • Read and understand the script fully before you execute it. It will cut a Maven release, build binary releases and documentation, then copy the binary artifacts to a staging location on people.apache.org.
  • NOTE: You must use git 1.7.X for this or else you'll hit this horrible bug.

After script has completed, find the open staging repository in Apache Nexus to which the artifacts were uploaded to. Close the staging repository. Wait for the closing to succeed. Now all the staged artifacts are public!

Auditing a Staged Release Candidate

Rolling Back Release Candidates

  • If a release candidate does not pass, it is necessary to roll back the commits which advanced Spark's versioning.

    Code Block
    languagebash
    # Checkout the release branch from Apache repo
     
    # Delete earlier tag. If you are using RC-based tags (v0.9.1-rc1) then skip this.
    $ git tag -d v0.9.1
    $ git push origin :v0.9.1
    
    # Revert changes made by the Maven release plugin 
    $ git revert HEAD --no-edit    # revert dev version commit
    $ git revert HEAD~2 --no-edit  # revert release commit
    $ git push apache HEAD:branch-0.9

Auditing a Staged Release Candidate

  • The process of auditing release has been automated via via this release audit script.
    • Find the staging repository in in Apache Nexus to  to which the artifacts were uploaded to. 
    • Configure the script by specfiying the version number to audit, the key ID of the signing key, and the URL to staging repository.
    • This script has to be run from the parent directory for the script.
    • Make sure "sbt" is installed.
  • The release auditor will test example builds against the staged artifacts, verify signatures, and check for common mistakes made when cutting a release.

Calling a

...

vote on the Release Candidate

  • The release voting takes place on the Apache Spark developers list (the PMC is voting). Look at past vote threads to see how this goes. They should look like the draft below.
    • Make a shortened link to the full list of JIRAs using  http://s.apache.org/
    • If possible, attach a draft of the release notes with the e-mail.
    • Make sure the voting closing time is in UTC format. Use this this script to  to generate it.
    • Make sure the email is in text format.
  • Once the vote is done, you should also send out a summary e-mail with the totals (subject “[RESULT] [VOTE]...”).

    Panel
    borderColorblack
    title\[VOTE\] Release Apache Spark 0.9.1 (rc1)
    borderStylesolid

    Please vote on releasing the following candidate as Apache Spark version 1.0.2.

    This release fixes a number of bugs in Spark 1.0.1.
    Some of the notable ones are
    SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix for
    SPARK-1199. The fix was reverted for 1.0.2.
    SPARK-2576: NoClassDefFoundError when executing Spark QL query on
    HDFS CSV file.
    The full list is at http://s.apache.org/9NJ

    The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e):
    https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f

    The release files, including signatures, digests, etc can be found at:
    http://people.apache.org/~tdas/spark-1.0.2-rc1/

    Release artifacts are signed with the following key:
    https://people.apache.org/keys/committer/tdas.asc

    The staging repository for this release can be found at:
    https://repository.apache.org/content/repositories/orgapachespark-1024/

    The documentation corresponding to this release can be found at:
    http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/

    Please vote on releasing this package as Apache Spark 1.0.2!

    The vote is open until Tuesday, July 29, at 23:00 UTC and passes if
    a majority of at least 3 +1 PMC votes are cast.
    [ ] +1 Release this package as Apache Spark 1.0.2
    [ ] -1 Do not release this package because ...
    To learn more about Apache Spark, please see
    http://spark.apache.org/Apache Spark 1.0.2
    [ ] -1 Do not release this package because ...

    To learn more about Apache Spark, please see
    http://spark.apache.org/

Rolling Back Release Candidates

  • If a release candidate does not pass, it is necessary to roll back the commits which advanced Spark's versioning.

    Code Block
    languagebash
    # Checkout the release branch from Apache repo
     
    # Delete earlier tag. If you are using RC-based tags (v0.9.1-rc1) then skip this.
    $ git tag -d v0.9.1
    $ git push origin :v0.9.1
    
    # Revert changes made by the Maven release plugin 
    $ git revert HEAD --no-edit    # revert dev version commit
    $ git revert HEAD~2 --no-edit  # revert release commit
    $ git push apache HEAD:branch-0.9

Cutting the Official Release

Performing the Final Release in Nexus

...