Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Please visit Google Doc for review, wiki page will be updated after release plan is finalized.

Table of Contents

This page describes the release, validation, and other aspects related to SDKHarness images.

...

This section describes the naming scheme and location for publication of the containersimages.

Proposed repository

...

We propose to use gcr.io/beam, created under apache-beam-testing project, artifacts accessible publicly. SDK Harness containers go to sdk folder.

Things to know about gcr.io:

  • Quotas:
    • It seems like gcr only has hit limit from each IP, I didn't find any documentation about image limit. But it would have size limit.
  • Permissions:
    • download access - public
    • publish - limited to authorized accounts that have correct permissions under apache-beam-testing:
      • publishing the snapshots nightly might be feasible similar to how we currently publish nightly maven snapshots, by creating a Jenkins job;
      • publishing at release time can be another job triggered manually by the release owner;
        • Depends on how many tests we want to run. If need to run many tests, it should be a Jenkins job, otherwise, we can write a shell script to automate the process.
  • GCP Project:
    • apache-beam-testing

Proposed naming and tagging scheme

...


NamingTaggingExample
Snapshot imageimageslanguage + language_versionyyyymmdd_{status} in UTCgcr.io/apache-beam-testing/beam/sdk/snapshot/yyyy2019/mm08/dd20/java:20190820_verified
Release images

language + language_version

(Java and Go go without language version until we support it.)


Beam release version

gcr.io/apache-beam-testing/beam/sdk/release/python2.7:2.10.1

...

...

gcr.io

...

  • ?

...

/apache-beam-testing

...

  • publishing the snapshots nightly might be feasible similar to how we currently publish nightly maven snapshots, by creating a Jenkins job;
  • publishing at release time can be another Jenkins job that is triggered manually by the release owner;
    • what's the process of triggering a job in such case?

/beam/sdk/release/java:2.10.1

*Java and Go will not have language version until we support multi versions.

Publication Schedule

Snapshot images

...

  • apache-beam-testing

...

  • If something goes wrong with the release process:
    • ping dev@ ?
  • If something goes wrong with a customer pipeline from using the prebuilt images:
    • ping dev@ ?

Publication Schedule

...

  • how do we publish the snapshots, what's the frequency?
    • automatic, when HEAD is built, nightly similar to maven snapshots, or manually?.
  • when and how do we cleanup the snapshot images:
    • when you publish an image you have a new version/hash and still can use any previous versions. They take space and will count towards a quota. We need to clean them up periodically:
      • make it part of the publish job to look and delete the versions that are more than X days (or versions) old?

Release Images

  • how How should we build and publish the images for release versions of Beam?
    • Start with manually, as part of Beam release ;
    • or by automatically triggering a job at some step of the release?
    • and automate it later.
  • Should should it be a blocker for the release?
    • should Should we make it part of the release and not mark release as complete until the images are published?
      • or can we publish them in a separate process later/earlier?
        • For first three release(v2.16 ~ v2.18), we can make it optional, and if all these release go well, we can make it mandatory part of the release(from v2.19).
      • Should should it be done by the same release owner?
        • Yes, this document provides step by step guide about building, testing and pushing. In future, we can automate this process.
      • Should should the validation be part of the release validation?

        Commands

        This section describes the commands that are used to build, publish, run tests and examples for the images.

        Prerequisites

        • docker

        Build

        Set repository and tag as variables:

        Code Block
        languagebash
        titleSnapshot
        linenumberstrue
        export REPOSITORY=gcr.io/apache-beam-testing/beam/sdk/snapshot/`date +"%Y/%m/%d"`
        export TAG=`date +"%Y%m%d"`
        Code Block
        languagebash
        titleRelease
        linenumberstrue
        export REPOSITORY=gcr.io/apache-beam-testing/beam/sdk/release
        export TAG=2.15.0

        To build docker images run:

        Code Block
        languagebash
        titlePython
        linenumberstrue
        $ pwd
        [...]/beam/
        $ ./gradlew :sdks:python:container:py2:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
        $ ./gradlew :sdks:python:container:py35:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
        $ ./gradlew :sdks:python:container:py36:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
        $ ./gradlew :sdks:python:container:py37:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
        
        Code Block
        languagebash
        titleJava
        linenumberstrue
        $ ./gradlew :sdks:java:container:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
        Code Block
        languagebash
        titleGo
        linenumberstrue
        $ ./gradlew :sdks:go:container:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
            • Yes, when we validate code base.

        Docker images

        LanguageSupported versionsDocker image name
        Python


        2.7python2.7
        3.5python3.5
        3.6python3.6
        3.7python3.7
        Java8java
        11not available
        Go1.12go

        Commands

        This section describes the commands that are used to build, publish, run tests and examples for the images.

        Prerequisites

        Docker should be installed.This produces local images named

        Code Block
        languagebash
        titleconfirm docker images listis installed
        linenumberstrue
        $ docker images
        $REPOSITORY/python2.7:$TAG
        $REPOSITORY/python3.5:$TAG
        $REPOSITORY/python3.6:$TAG
        $REPOSITORY/python3.7:$TAG
        $REPOSITORY/java:$TAG
        $REPOSITORY/go:$TAG

        They can be examined by running docker commands:

        Code Block
        languagebash
        linenumberstrue
        $ docker image ls
        $ docker run --entrypoint bash -it $REPOSITORY/python3.5:$TAG-v
        Docker version 18.09.3, build 774a1f4

        Run a test against

        ...

        containers

        This is the last part of release process, at this moment, we already finished release validation, do we need to run tests against container again which is generated from the already validated code?Please note all following tests create new images each time, so they don't use the images we created above, however, since they are created from the same code, so we assume images are exactly same and if all tests passed with the new created images then assume images created above also pass the tests.

        Running precommit tests at local

        ...

        Code Block
        languagebash
        titlePython
        linenumberstrue
        $ ./gradlew :sdks:python:test-suites:portable:py2:preCommitPy2
        $ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy35
        $ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy36
        $ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy37

        ...


        Code Block
        languagebash
        titleJava
        linenumberstrue
        $ ./gradlew :javaPreCommitPortabilityApi --continue --info

        ...

        Code Block
        languagebash
        titleGo
        linenumberstrue
        $ ./gradlew :goPreCommit

         

        Run postcommit tests at local

        ...

        Code Block
        languagebash
        titlePython
        linenumberstrue
        $ ./gradlew :python2PostCommit
        $ ./gradlew :python35PostCommit
        $ ./gradlew :python36PostCommit
        $ ./gradlew :python37PostCommit

        ...

        Code Block
        languagebash
        titleGo
        linenumberstrue
        $ ./gradlew :goPostCommit

        Publish images to GCR

        Publishing an image images to gcr.io/beam requires permissions in apache-beam-testing project.

        Please note this will create new images (not used for above testings). Since they are created from the same code, we assume the images are exactly the same.

        Set repository and tag as variables:

        Code Block
        languagebash
        titlePythonFor snapshot
        linenumberstrue
        export REPOSITORY=gcr.io/apache-beam-testing/beam/sdk/snapshot/`date +"%Y/%m/%d"`
        export TAG=`date +"%Y%m%d"`


        Code Block
        languagebash
        titleFor release
        linenumberstrue
        export REPOSITORY=gcr.io/apache-beam-testing/beam/sdk/release
        export TAG=2.15.0

        To build and push docker images run:

        Code Block
        languagebash
        titlePython
        linenumberstrue
        $ pwd
        [...]/beam/
        $ ./gradlew :sdks:python:container:py2:dockerPush  -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
        $ ./gradlew :sdks:python:container:py35:dockerPush -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
        $ ./gradlew :sdks:python:container:py36:dockerPush -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
        $ ./gradlew :sdks:python:container:py37:dockerPush -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
        $ docker push $REPOSITORY/python2.7:$TAG
        $ docker push $REPOSITORY/python3.5:$TAG
        $ docker push $REPOSITORY/python3.6:$TAG
        $ docker push $REPOSITORY/python3.7:$TAG


        Code Block
        languagebash
        titleJava
        linenumberstrue
        $ docker push $REPOSITORY/java:$TAG$ ./gradlew :sdks:java:container:dockerPush -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info


        Code Block
        languagebash
        titleGo
        linenumberstrue
        $ docker push $REPOSITORY/go:$TAG$ ./gradlew :sdks:go:container:dockerPush -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info


        Release Images Validation

        ...

        We have these test suites in Beam that utilize portability:

        • ...
        • ...

        To execute a test suite X against container Y run this command:

          $ ./gradlew ... ?

        Manual testing

        • precommit tests
        • postcommit tests

        *These tests will use default images, not the one pushed to gcr.

        Manual testing

        Test on Dataflow with uploaded imagesUse uploaded images and test on Dataflow.

        Code Block
        languagebash
        titlePython
        linenumberstrue
        # this should run againest py2.7, py3.5, py3.6 and py3.7.
        $ pwd
        [...]beam//sdks/python
        $ python -m apache_beam.examples.wordcount \
          --input gs://apache-beam-samples/shakespeare/hamlet.txt \
          --output gs://temp-storage-for-end-to-end-tests/staging-$USER/output \
          --runner DataflowRunner \
          --project apache-beam-testing \
          --temp_location gs://temp-storage-for-end-to-end-tests/staging-$USER/ \
          --worker_harness_container_image $REPOSITORY/python2.7:$TAG \
          --experiment beam_fn_api \
          --sdk_location sdks/python/container/py2/build/target/apache-beam.tar.gz

        ...

        • Do we want new images to be able to run old pipelines?
            Vice-versa?
            • This is decided by SDK, not container specific.
          • How long do we support backwards compatibility for?
            • This is decided by SDK, not container specific.

          Other verification

          • Do we sign the artifacts, images?
            • ??
          • Do we check hashes, signatures?
            • No

          Support Story

          How do we allow the images to be used?

          1. Snapshot images can be used for daily testing.
          2. Snapshot images are always created from head, so users can use Beam at head if they want to.
          3. Customize containers on top of published images.
          4. ...

          Do we support users using them in production as is?

          Release images can be used in production. Snapshot images can be used in production, but we don't guarantee they are as stable as release images. 

          Is there some quota for downloading the prebuilt images?

          ...

          From gCloud instruction, gcr has following quota limitations.

          Info
          iconfalse

          Any request sent to Container Registry has a 2 hour timeout limit.

          The fixed rate limits per client IP address are:

          • 30,000 HTTP requests every 10 minutes
          • 500,000 HTTP requests per day

          Container Registry uses Cloud Storage for each registry's underlying storage. Cloud Storage Quotas & Limits apply to each registry.


          Any special/extra license needs to be attached to the images?


          Report bugs

          • If something goes wrong with the release process:
            • ping dev@ 
          • If something goes wrong with a customer pipeline from using the prebuilt images:
            • ping dev@