You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 20 Next »

This page describes the release, validation, and other aspects related to SDKHarness images.

Overview

The idea is to build a set of public SDKHarness pre-built images, that users can utilize to run their portable pipelines without having to manually build them, or use these images as base images for customization.

Background information

  • SDKHarness architecture / design docs ?
  • Image structure ?
  • Source ? 

Location/Naming

This section describes the naming scheme and location for publication of the containers.

Proposed repository:

We propose to use gcr.io/beam, created under apache-beam-testing project, artifacts accessible publicly. SDK Harness containers go to sdk folder.

Things to know about gcr.io:

  • Quotas:
    • ?
  • Permissions:
    • download access - public
    • publish - limited to authorized accounts that have correct permissions under apache-beam-testing:
      • publishing the snapshots nightly might be feasible similar to how we currently publish nightly maven snapshots, by creating a Jenkins job;
      • publishing at release time can be another Jenkins job that is triggered manually by the release owner;
        • what's the process of triggering a job in such case?
  • GCP Project:
    • apache-beam-testing
  • Troubleshooting:
    • If something goes wrong with the release process:
      • ping dev@ ?
    • If something goes wrong with a customer pipeline from using the prebuilt images:
      • ping dev@ ?

Proposed naming and tagging scheme:


NamingTaggingExample
Snapshot imagelanguage + language_versionyyyymmdd_{status} in UTCgcr.io/apache-beam-testing/beam/sdk/snapshot/2019/08/20/java:20190820_verified
Release images

language + language_version

(Java and Go go without language version until we support multi versions.)

Beam release version

gcr.io/apache-beam-testing/beam/sdk/release/python2.7:2.10.1

gcr.io/apache-beam-testing/beam/sdk/release/java:2.10.1

Publication Schedule

Snapshots

  • how do we publish the snapshots, what's the frequency?
    • automatic, when HEAD is built, nightly similar to maven snapshots, or manually?
  • when and how do we cleanup the snapshot images:
    • when you publish an image you have a new version/hash and still can use any previous versions. They take space and will count towards a quota. We need to clean them up periodically:
      • make it part of the publish job to look and delete the versions that are more than X days (or versions) old?

Release Images

  • how should we build and publish the images for release versions of Beam?
    • manually, as part of Beam release;
    • or by automatically triggering a job at some step of the release?
  • should it be a blocker for the release?
    • should we make it part of the release and not mark release as complete until the images are published?
      • For first three release(v2.16 ~ v2.18), we can make it optional, and if all these release go well, we can make it mandatory part of the release(from v2.19).
    • should it be done by the same release owner?
      • Yes, this document provides step by step guide about building, testing and pushing. In future, we can automate this process.
    • should the validation be part of the release validation?
      • Yes.

Commands

This section describes the commands that are used to build, publish, run tests and examples for the images.

Prerequisites

  • docker

Build

Set repository and tag as variables:

For snapshot
export REPOSITORY=gcr.io/apache-beam-testing/beam/sdk/snapshot/`date +"%Y/%m/%d"`
export TAG=`date +"%Y%m%d"`
For release
export REPOSITORY=gcr.io/apache-beam-testing/beam/sdk/release
export TAG=2.15.0

To build docker images run:

Python
$ pwd
[...]/beam/
$ ./gradlew :sdks:python:container:py2:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
$ ./gradlew :sdks:python:container:py35:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
$ ./gradlew :sdks:python:container:py36:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
$ ./gradlew :sdks:python:container:py37:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info


Java
$ ./gradlew :sdks:java:container:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info


Go
$ ./gradlew :sdks:go:container:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info


This produces local images named

docker images list
$ docker images
$REPOSITORY/python2.7:$TAG
$REPOSITORY/python3.5:$TAG
$REPOSITORY/python3.6:$TAG
$REPOSITORY/python3.7:$TAG
$REPOSITORY/java:$TAG
$REPOSITORY/go:$TAG

They can be examined by running docker commands:

$ docker image ls
$ docker run --entrypoint bash -it $REPOSITORY/python3.5:$TAG

Run a test against locally built container

Please note all following tests create new images each time, so they don't use the images we created above, however, since they are created from the same code, so we assume images are exactly same and if all tests passed with the new created images then assume images created above also pass the tests.

Running precommit tests at local.

Python
$ ./gradlew :sdks:python:test-suites:portable:py2:preCommitPy2
$ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy35
$ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy36
$ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy37

 

Java
$ ./gradlew :javaPreCommitPortabilityApi --continue --info
Go
$ ./gradlew :goPreCommit

Run postcommit tests at local:

Python
$ ./gradlew :python2PostCommit
$ ./gradlew :python35PostCommit
$ ./gradlew :python36PostCommit
$ ./gradlew :python37PostCommit
Java
$ ./gradlew :javaPostCommitPortabilityApi --continue --info
Go
$ ./gradlew :goPostCommit

Publish

Publishing an image to gcr.io/beam requires permissions in apache-beam-testing project.

Python
$ docker push $REPOSITORY/python2.7:$TAG
$ docker push $REPOSITORY/python3.5:$TAG
$ docker push $REPOSITORY/python3.6:$TAG
$ docker push $REPOSITORY/python3.7:$TAG
Java
$ docker push $REPOSITORY/java:$TAG
Go
$ docker push $REPOSITORY/go:$TAG

Release Images Validation

This section describes how to validate a built and/or published image.

Make sure the images are pullable
$ docker pull $REPOSITORY/python2.7:$TAG
$ docker pull $REPOSITORY/java:$TAG
$ docker pull $REPOSITORY/go:$TAG

Automated test suites

We have these test suites in Beam that utilize portability:

  • precommit tests(?)
  • postcommit tests(?)

Manual testing

Test on Dataflow with uploaded images.

Python
# this should run againest py2.7, py3.5, py3.6 and py3.7.
$ pwd
[...]beam//sdks/python
$ python -m apache_beam.examples.wordcount \
  --input gs://apache-beam-samples/shakespeare/hamlet.txt \
  --output gs://temp-storage-for-end-to-end-tests/staging-$USER/output \
  --runner DataflowRunner \
  --project apache-beam-testing \
  --temp_location gs://temp-storage-for-end-to-end-tests/staging-$USER/ \
  --worker_harness_container_image $REPOSITORY/python2.7:$TAG \
  --experiment beam_fn_api \
  --sdk_location sdks/python/container/py2/build/target/apache-beam.tar.gz


Java
// get WordCount example code as a maven project
$ mvn archetype:generate \
      -DarchetypeGroupId=org.apache.beam \
      -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
      -DarchetypeVersion=2.6.0 \
      -DgroupId=org.example \
      -DartifactId=word-count-beam \
      -Dversion="0.1" \
      -Dpackage=org.apache.beam.examples \
      -DinteractiveMode=false

// run Java project
$ pwd
[...]/word-count-beam
$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="\
  --runner=DataflowRunner \
  --project=apache-beam-testing \
  --stagingLocation=gs://temp-storage-for-end-to-end-tests/staging-$USER/ \
  --workerHarnessContainerImage=$REPOSITORY/java:$TAG \
  --experiments=beam_fn_api \
  --output=gs://temp-storage-for-end-to-end-tests/staging-$USER/output" \
  -Pdataflow-runner
Go
$ pwd
[...]/beam/sdks/go
$ go run examples/wordcount/wordcount.go \
--runner=dataflow \
--project=apache-beam-testing \
--staging_location=gs://temp-storage-for-end-to-end-tests/staging-$USER/  \
--worker_harness_container_image=$REPOSITORY/go:$TAG \
--output=gs://temp-storage-for-end-to-end-tests/staging-$USER/output

Backwards compatibility

  • Do we want new images to be able to run old pipelines?
  • Vice-versa?
  • How long do we support backwards compatibility for?

Other verification

  • Do we sign the artifacts, images?
  • Do we check hashes, signatures?

Support Story

How do we allow the images to be used?

  • Do we support users using them in production as is?
  • Is there some quota for downloading the prebuilt images?

What's the process of reporting issues with the built images?

Any special/extra license needs to be attached to the images?











  • No labels