This is a checklist for community members to validate new Apache Hadoop releases.

Overview

By ASF policy the PMC votes on release artifacts hosted at dist.apache.org. E.g. for Apache Hadoop 3.1.0, the following artifacts are covered by this policy:

  1. hadoop-3.1.0-src.tar.gz
  2. hadoop-3.1.0-src.tar.gz.asc
  3. hadoop-3.1.0-src.tar.gz.mds
  4. hadoop-3.1.0.tar.gz
  5. hadoop-3.1.0.tar.gz.asc
  6. hadoop-3.1.0.tar.gz.mds

Additionally it is a good idea to verify the Maven artifacts on repository.apache.org as these will be consumed by downstream projects.

Step-by-step guide

You don't need to verify all of the below before voting on a release.

Check release artifacts

Verify that the release bits were correctly generated. These steps don't check for release functionality.

  1. Verify signatures using the instructions at https://www.apache.org/info/verification.html#CheckingSignatures. You will need GPG installed. On MacOS machines, you can install GPG with HomeBrew using brew install gpg.
  2. Verify checkums for the source and binary artifacts from the corresponding .mds files. E.g. a quick way to do so using gpg is:

    gpg --print-mds hadoop-3.1.1-src.tar.gz > mds.tmp
    diff hadoop-3.1.1-src.tar.gz.mds mds.tmp
  3. Verify that there are no MD5 signatures provided.
  4. Verify that jars have been correctly staged to repository.apache.org.
    1. For 3.x releases and later, verify that the shaded fat jars look correct.
  5. Sanity check the CHANGES.md and RELEASENOTES.md files.
  6. Verify that source and binary tarballs include LICENSE.txt and NOTICE.txt files.

 

Verify Source Release Bits

  1. Extract the source tarball and build the release from sources using:
    1. mvn clean install package -Pdist -DskipTests=true
  2. The binary tarball will be available at hadoop-dist/target/hadoop-<version>.tar.gz.

  3. Follow all the steps from the Verify Binary Release section below.
  4. Verify that the source distribution has no extra files/changes by diffing against the git tag in a local clone of the Hadoop repo. E.g.

    $ git checkout release-3.1.0-RC1
    $ diff -r $PWD /tmp/hadoop-3.1.0  # Assuming RC src tarball was unpacked in /tmp/hadoop-3.1.0

 

Verify Binary Release Bits

  1. Install the binary release to a cluster. E.g. this may be a single-node pseudo-cluster, a 3-node cluster of VMs or a real cluster of arbitrary size.
  2. Start up HDFS and YARN services and check release functionality. Some example functionality that applies to all releases:
    1. Try out file system shell commands.
    2. Try out some admin commands.
    3. Check the various service web UIs.
    4. Enable NameNode HA.
    5. Launch example MapReduce jobs.
    6. Repeat the above steps with Kerberos Security enabled.
    7. <Add more yarn-specific checks here>
  3. Check release specific features.

 

Verify Site Documentation

  1. Generate and stage the site documentation with the following commands and verify site docs look okay:
    1. mvn site:site
    2. mkdir -p /tmp/site && mvn site:stage -DstagingDirectory=/tmp/site
    3. Browse to file:///tmp/site/hadoop-project/index.html.