Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Please visit Google Doc for review, wiki page will be updated after release plan is finalized.

Table of Contents

This page describes the release, validation, and other aspects related to SDKHarness images.

...

This section describes the naming scheme and location for publication of the containersimages.

Proposed repository

...

We propose to use gcr.io/beam, created under apache-beam-testing project, artifacts accessible publicly. SDK Harness containers go to sdk folder.

Things to know about gcr.io:

  • Quotas:
    • It seems like gcr only has hit limit from each IP, I didn't find any documentation about image limit. But it would have size limit.
  • Permissions:
    • download access - public
    • publish - limited to authorized accounts that have correct permissions under apache-beam-testing:
      • publishing the snapshots nightly might be feasible similar to how we currently publish nightly maven snapshots, by creating a Jenkins job;
      • publishing at release time can be another job triggered manually by the release owner;
        • Depends on how many tests we want to run. If need to run many tests, it should be a Jenkins job, otherwise, we can write a shell script to automate the process.
  • GCP Project:
    • apache-beam-testing

Proposed naming and tagging scheme

...


NamingTaggingExample
Snapshot imageimageslanguage + language_versionyyyymmdd_{status} in UTCgcr.io/apache-beam-testing/beam/sdk/snapshot/yyyy2019/mm08/dd20/java:20190820_verified
Release images

language + language_version

(Java and Go go without language version until we support it.)


Beam release version

gcr.io/apache-beam-testing/beam/sdk/release/python2.7:2.10.1

...

gcr.io

...

  • ?

...

/apache-beam-testing

...

  • publishing the snapshots nightly might be feasible similar to how we currently publish nightly maven snapshots, by creating a Jenkins job;
  • publishing at release time can be another Jenkins job that is triggered manually by the release owner;
    • what's the process of triggering a job in such case?

/beam/sdk/release/java:2.10.1

*Java and Go will not have language version until we support multi versions.

Publication Schedule

Snapshot images

...

  • apache-beam-testing

...

  • If something goes wrong with the release process:
    • ping dev@ ?
  • If something goes wrong with a customer pipeline from using the prebuilt images:
    • ping dev@ ?

Publication Schedule

...

  • how do we publish the snapshots, what's the frequency?
    • automatic, when HEAD is built, nightly similar to maven snapshots, or manually?.
  • when and how do we cleanup the snapshot images:
    • when you publish an image you have a new version/hash and still can use any previous versions. They take space and will count towards a quota. We need to clean them up periodically:
      • make it part of the publish job to look and delete the versions that are more than X days (or versions) old?

Release Images

  • how How should we build and publish the images for release versions of Beam?
    • Start with manually, as part of Beam release ;
    • or by automatically triggering a job at some step of the release?
    • and automate it later.
  • Should should it be a blocker for the release?
    • should Should we make it part of the release and not mark release as complete until the images are published?
      • or can we publish them in a separate process later/earlier?
        • For first three release(v2.16 ~ v2.18), we can make it optional, and if all these release go well, we can make it mandatory part of the release(from v2.19).
      • Should should it be done by the same release owner?
        • Yes, this document provides step by step guide about building, testing and pushing. In future, we can automate this process.
      • Should should the validation be part of the release validation?
        • Yes, when we validate code base.

    Docker images

    LanguageSupported versionsDocker image name
    Python


    2.7python2.7
    3.5python3.5
    3.6python3.6
    3.7python3.7
    Java8java
    11not available
    Go1.12go

    Commands

    This section describes the commands that are used to build, publish, run tests and examples for the images.

    Prerequisites

    Docker should be installed.

    Code Block
    languagebash
    titleconfirm docker

    ...

    is installed
    linenumberstrue
    $ docker -v
    Docker version 18.09.3, build 774a1f4

    Run a test against containers

    This is the last part of release process, at this moment, we already finished release validation, do we need to run tests against container again which is generated from the already validated code?

    Running precommit tests at local

    Code Block
    languagebash
    titlePython
    linenumberstrue
    $ ./gradlew :sdks:python:test-suites:portable:py2:preCommitPy2
    $ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy35
    $ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy36
    $ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy37


    Code Block
    languagebash
    titleJava
    linenumberstrue
    $ ./gradlew :javaPreCommitPortabilityApi --continue --info


    Code Block
    languagebash
    titleGo
    linenumberstrue
    $ ./gradlew :goPreCommit

     

    Run postcommit tests at local

    Code Block
    languagebash
    titlePython
    linenumberstrue
    $ ./gradlew :python2PostCommit
    $ ./gradlew :python35PostCommit
    $ ./gradlew :python36PostCommit
    $ ./gradlew :python37PostCommit


    Code Block
    languagebash
    titleJava
    linenumberstrue
    $ ./gradlew :javaPostCommitPortabilityApi --continue --info


    Code Block
    languagebash
    titleGo
    linenumberstrue
    $ ./gradlew :goPostCommit

    Publish images to GCR

    Publishing images to gcr.io/beam requires permissions in apache-beam-testing project.

    Please note this will create new images (not used for above testings). Since they are created from the same code, we assume the images are exactly the same.

    Set repository and tag as variables:

    Code Block
    languagebash
    titleSnapshotFor snapshot
    linenumberstrue
    export REPOSITORY=gcr.io/apache-beam-testing/beam/sdk/snapshot/`date +"%Y/%m/%d"`
    export TAG=`date +"%Y%m%d"`

    ...

    Code Block
    languagebash
    titleReleaseFor release
    linenumberstrue
    export REPOSITORY=gcr.io/apache-beam-testing/beam/sdk/release
    export TAG=2.15.0

    To build and push docker images run:

    Code Block
    languagebash
    titlePython
    linenumberstrue
    $ pwd
    [...]/beam/
    $ ./gradlew :sdks:python:container:py2:dockerdockerPush  -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
    $ ./gradlew :sdks:python:container:py35:dockerdockerPush -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
    $ ./gradlew :sdks:python:container:py36:dockerdockerPush -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
    $ ./gradlew :sdks:python:container:py37:dockerdockerPush -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
    

    ...

    Code Block
    languagebash
    titleJava
    linenumberstrue
    $ ./gradlew :sdks:java:container:dockerdockerPush -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info

    ...

    Code Block
    languagebash
    titleGo
    linenumberstrue
    $ ./gradlew :sdks:go:container:dockerdockerPush -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info


    Release Images Validation

    This section describes how to validate a built and/or published image.This produces local images named

    Code Block
    languagebash
    titledocker images listMake sure the images are pullable
    linenumberstrue
    $ docker images
    pull $REPOSITORY/python2.7:$TAG
    $REPOSITORY/python3.5:$TAG
    $REPOSITORY/python3.6:$TAG
    $REPOSITORY/python3.7:$TAG
    $ docker pull $REPOSITORY/java:$TAG
    $ docker pull $REPOSITORY/go:$TAG

    They can be examined by running docker commands:

    Automated test suites

    We have these test suites in Beam that utilize portability:

    • precommit tests
    • postcommit tests

    *These tests will use default images, not the one pushed to gcr.

    Manual testing

    Test on Dataflow with uploaded images.

    Code Block
    languagebash
    titlePython
    linenumberstrue
    $ docker image ls
    $ docker run --entrypoint bash -it $REPOSITORY/python3.5:$TAG

    Run a test against locally built container

    ...

    # this should run againest py2.7, py3.5, py3.6 and py3.7.
    $ pwd
    [...]beam/sdks/python
    $ python -m apache_beam.examples.wordcount \
      --input gs://apache-beam-samples/shakespeare/hamlet.txt \
      --output gs://temp-storage-for-end-to-end-tests/staging-$USER/output \
      --runner DataflowRunner \
      --project apache-beam-testing \
      --temp_location gs://temp-storage-for-end-to-end-tests

    ...

    Code Block
    languagebash
    titlePython
    linenumberstrue
    $ bash sdks/python/container/run_validatescontainer.sh python2
    $ bash sdks/python/container/run_validatescontainer.sh python3.5
    $ bash/staging-$USER/ \
      --worker_harness_container_image $REPOSITORY/python2.7:$TAG \
      --experiment beam_fn_api \
      --sdk_location sdks/python/container/run_validatescontainer.sh python3.6
    $ bash sdks/python/container/run_validatescontainer.sh python3.7py2/build/target/apache-beam.tar.gz


    Code Block
    languagebash
    titleJava
    linenumberstrue
    $ ./gradlew :javaPostCommitPortabilityApi --continue --info
    Code Block
    languagebash
    titleGo
    linenumberstrue
    $ ./gradlew :goPostCommit

    Running precommit tests at local.

    Code Block
    languagebash
    titlePython
    linenumberstrue
    $ ./gradlew :sdks:python:test-suites:portable:py2:preCommitPy2
    $ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy35
    $ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy36
    $ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy37

     

    Code Block
    languagebash
    titleJava
    linenumberstrue
    $ ./gradlew :javaPreCommitPortabilityApi --continue --info
    Code Block
    languagebash
    titleGo
    linenumberstrue
    $ ./gradlew :goPreCommit

    Publish

    Publishing an image to gcr.io/beam requires permissions in apache-beam-testing project.

    Code Block
    languagebash
    titlePython
    linenumberstrue
    $ docker push $REPOSITORY/python2.7:$TAG
    $ docker push $REPOSITORY/python3.5:$TAG
    $ docker push $REPOSITORY/python3.6:$TAG
    $ docker push $REPOSITORY/python3.7:$TAG
    Code Block
    languagebash
    titleJava
    linenumberstrue
    $ docker push $REPOSITORY/java:$TAG// get WordCount example code as a maven project
    $ mvn archetype:generate \
          -DarchetypeGroupId=org.apache.beam \
          -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
          -DarchetypeVersion=2.6.0 \
          -DgroupId=org.example \
          -DartifactId=word-count-beam \
          -Dversion="0.1" \
          -Dpackage=org.apache.beam.examples \
          -DinteractiveMode=false
    
    // run Java project
    $ pwd
    [...]/word-count-beam
    $ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="\
      --runner=DataflowRunner \
      --project=apache-beam-testing \
      --stagingLocation=gs://temp-storage-for-end-to-end-tests/staging-$USER/ \
      --workerHarnessContainerImage=$REPOSITORY/java:$TAG \
      --experiments=beam_fn_api \
      --output=gs://temp-storage-for-end-to-end-tests/staging-$USER/output" \
      -Pdataflow-runner


    Code Block
    languagebash
    titleGo
    linenumberstrue
    $ pwd
    [...]/beam/sdks/go
    $ dockergo pushrun $REPOSITORYexamples/wordcount/wordcount.go:$TAG

    Release Images Validation

    This section describes how to validate a built and/or published image.

    Automated test suites

    We have these test suites in Beam that utilize portability:

    • ...
    • ...

    To execute a test suite X against container Y run this command:

      $ ./gradlew ... ?

    Manual testing

    To run a custom pipeline (question) against an image (question):

    ...

     \
    --runner=dataflow \
    --project=apache-beam-testing \
    --staging_location=gs://temp-storage-for-end-to-end-tests/staging-$USER/  \
    --worker_harness_container_image=$REPOSITORY/go:$TAG \
    --output=gs://temp-storage-for-end-to-end-tests/staging-$USER/output

    Backwards compatibility

    • Do we want new images to be able to run old pipelines?
        Vice-versa?
        • This is decided by SDK, not container specific.
      • How long do we support backwards compatibility for?
        • This is decided by SDK, not container specific.

      Other verification

      • Do we sign the artifacts, images?
        • ??
      • Do we check hashes, signatures?
        • No

      Support Story

      How do we allow the images to be used?

      1. Snapshot images can be used for daily testing.
      2. Snapshot images are always created from head, so users can use Beam at head if they want to.
      3. Customize containers on top of published images.
      4. ...

      Do we support users using them in production as is?

      Release images can be used in production. Snapshot images can be used in production, but we don't guarantee they are as stable as release images. 

      Is there some quota for downloading the prebuilt images?What's the process of reporting issues with the built images?

      From gCloud instruction, gcr has following quota limitations.

      Info
      iconfalse

      Any request sent to Container Registry has a 2 hour timeout limit.

      The fixed rate limits per client IP address are:

      • 30,000 HTTP requests every 10 minutes
      • 500,000 HTTP requests per day

      Container Registry uses Cloud Storage for each registry's underlying storage. Cloud Storage Quotas & Limits apply to each registry.


      Any special/extra license needs to be attached to the images?


      Report bugs

      • If something goes wrong with the release process:
        • ping dev@ 
      • If something goes wrong with a customer pipeline from using the prebuilt images:
        • ping dev@