This page describes the release, validation, and other aspects related to SDKHarness images.
Overview
The idea is to build a set of public SDKHarness pre-built images, that users can utilize to run their portable pipelines without having to manually build them, or use these images as base images for customization.
Background information
- SDKHarness architecture / design docs ?
- Image structure ?
- Source ?
Location/Naming
This section describes the naming scheme and location for publication of the containers.
Proposed repository:
gcr.io/beam, created under apache-beam-testing project, artifacts accessible publicly. SDK Harness containers go to sdk folder.
Proposed naming and tagging scheme:
Naming | Tagging | Example | |
---|---|---|---|
Snapshot image | language + language_version | yyyymmdd_{status} in UTC | gcr.io/apache-beam-testing/beam/sdk/snapshot/yyyy/mm/dd/java:20190820_verified |
Release images | language + language_version (Java and Go go without language version until we support it.) | Beam release version | gcr.io/apache-beam-testing/beam/sdk/release/python2.7:2.10.1 |
Gcr.io
Things to know about gcr.io:
- Quotas:
- ?
- Permissions:
- download access - public
- publish - limited to authorized accounts that have correct permissions under apache-beam-testing:
- publishing the snapshots nightly might be feasible similar to how we currently publish nightly maven snapshots, by creating a Jenkins job;
- publishing at release time can be another Jenkins job that is triggered manually by the release owner;
- what's the process of triggering a job in such case?
- GCP Project:
- apache-beam-testing
- Troubleshooting:
- If something goes wrong with the release process:
- ping dev@ ?
- If something goes wrong with a customer pipeline from using the prebuilt images:
- ping dev@ ?
- If something goes wrong with the release process:
Publication Schedule
Snapshots
- how do we publish the snapshots, what's the frequency?
- automatic, when HEAD is built, nightly similar to maven snapshots, or manually?
- when and how do we cleanup the snapshot images:
- when you publish an image you have a new version/hash and still can use any previous versions. They take space and will count towards a quota. We need to clean them up periodically:
- make it part of the publish job to look and delete the versions that are more than X days (or versions) old?
- when you publish an image you have a new version/hash and still can use any previous versions. They take space and will count towards a quota. We need to clean them up periodically:
Release Images
- how should we build and publish the images for release versions of Beam?
- manually, as part of Beam release;
- or by automatically triggering a job at some step of the release?
- should it be a blocker for the release?
- should we make it part of the release and not mark release as complete until the images are published?
- or can we publish them in a separate process later/earlier?
- should it be done by the same release owner?
- should the validation be part of the release validation?
Commands
This section describes the commands that are used to build, publish, run tests and examples for the images.
Prerequisites
- docker
Build
Set repository and tag as variables:
export REPOSITORY=gcr.io/apache-beam-testing/beam/sdk/snapshot/`date +"%Y/%m/%d"` export TAG=`date +"%Y%m%d"`
export REPOSITORY=gcr.io/apache-beam-testing/beam/sdk/release export TAG=2.15.0
To build docker images run:
$ pwd ...beam/ $ ./gradlew :sdks:python:container:py2:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info $ ./gradlew :sdks:python:container:py35:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info $ ./gradlew :sdks:python:container:py36:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info $ ./gradlew :sdks:python:container:py37:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
$ ./gradlew :sdks:java:container:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
$ ./gradlew :sdks:go:container:docker -Pdocker-repository-root=$REPOSITORY -Pdocker-tag=$TAG --info
This produces local images named
$ docker images $REPOSITORY/python2.7:$TAG $REPOSITORY/python3.5:$TAG $REPOSITORY/python3.6:$TAG $REPOSITORY/python3.7:$TAG $REPOSITORY/java:$TAG $REPOSITORY/go:$TAG
They can be examined by running docker commands:
$ docker image ls $ docker run --entrypoint bash -it $REPOSITORY/python3.5:$TAG
Run a test against locally built container
Running dataflow tests with a custom container requires GCR and dataflow permissions in a cloud project and a GCS location. By default these are apache-beam-testing and gs://temp-storage-for-end-to-end-tests but these can be overridden with PROJECT and GCS_LOCATION environment variables.
$ bash sdks/python/container/run_validatescontainer.sh python2 $ bash sdks/python/container/run_validatescontainer.sh python3.5 $ bash sdks/python/container/run_validatescontainer.sh python3.6 $ bash sdks/python/container/run_validatescontainer.sh python3.7
$ ./gradlew :javaPostCommitPortabilityApi --continue --info
$ ./gradlew :goPostCommit
Running precommit tests at local.
$ ./gradlew :sdks:python:test-suites:portable:py2:preCommitPy2 $ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy35 $ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy36 $ ./gradlew :sdks:python:test-suites:portable:py35:preCommitPy37
$ ./gradlew :javaPreCommitPortabilityApi --continue --info
$ ./gradlew :goPreCommit
Publish
Publishing an image to gcr.io/beam requires permissions in apache-beam-testing project.
$ docker push $REPOSITORY/python2.7:$TAG $ docker push $REPOSITORY/python3.5:$TAG $ docker push $REPOSITORY/python3.6:$TAG $ docker push $REPOSITORY/python3.7:$TAG
$ docker push $REPOSITORY/java:$TAG
$ docker push $REPOSITORY/go:$TAG
Release Images Validation
This section describes how to validate a built and/or published image.
Automated test suites
We have these test suites in Beam that utilize portability:
- ...
- ...
To execute a test suite X against container Y run this command:
$ ./gradlew ... ?
Manual testing
To run a custom pipeline against an image :
$ ./gradlew ... ?
Backwards compatibility
- Do we want new images to be able to run old pipelines?
- Vice-versa?
- How long do we support backwards compatibility for?
Other verification
- Do we sign the artifacts, images?
- Do we check hashes, signatures?
Support Story
How do we allow the images to be used?
- Do we support users using them in production as is?
- Is there some quota for downloading the prebuilt images?
What's the process of reporting issues with the built images?
Any special/extra license needs to be attached to the images?