This page describes the release, validation, and other aspects related to SDKHarness images.
Overview
The idea is to build a set of public SDKHarness pre-built images, that users can utilize to run their portable pipelines without having to manually build them, or use these images as base images for customization.
Background information
- SDKHarness architecture / design docs ?
- Image structure ?
- Source ?
Location/Naming
This section describes the naming scheme and location for publication of the containers.
Proposed repository:
gcr.io/beam/sdk, created under apache-beam-testing project, artifacts accessible publicly
Proposed naming and tagging scheme:
Naming | Tagging | Example | |
---|---|---|---|
Snapshot image | language + language_version | yyyymmdd-hhmmss in UTC | gcr.io/apache-beam-testing/beam/sdk/java8:20190820-020000 |
Release images | language + language_version | Beam release version | gcr.io/apache-beam-testing/beam/sdk/python2.7:2.10.1 |
Gcr.io
Things to know about gcr.io:
- Quotas:
- ?
- Permissions:
- download access - public
- publish - limited to authorized accounts that have correct permissions under apache-beam-testing:
- publishing the snapshots nightly might be feasible similar to how we currently publish nightly maven snapshots, by creating a Jenkins job;
- publishing at release time can be another Jenkins job that is triggered manually by the release owner;
- what's the process of triggering a job in such case?
- GCP Project:
- apache-beam-testing
- Troubleshooting:
- If something goes wrong with the release process:
- ping dev@ ?
- If something goes wrong with a customer pipeline from using the prebuilt images:
- ping dev@ ?
- If something goes wrong with the release process:
Publication Schedule
Snapshots
- how do we publish the snapshots, what's the frequency?
- automatic, when HEAD is built, nightly similar to maven snapshots, or manually?
- when and how do we cleanup the snapshot images:
- when you publish an image you have a new version/hash and still can use any previous versions. They take space and will count towards a quota. We need to clean them up periodically:
- make it part of the publish job to look and delete the versions that are more than X days (or versions) old?
- when you publish an image you have a new version/hash and still can use any previous versions. They take space and will count towards a quota. We need to clean them up periodically:
Release Images
- how should we build and publish the images for release versions of Beam?
- manually, as part of Beam release;
- or by automatically triggering a job at some step of the release?
- should it be a blocker for the release?
- should we make it part of the release and not mark release as complete until the images are published?
- or can we publish them in a separate process later/earlier?
- should it be done by the same release owner?
- should the validation be part of the release validation?
Commands
This section describes the commands that are used to build, publish, run tests and examples for the images.
Prerequisites
- docker
Build
To build docker images run:
$ ./gradlew :sdks:python:container:py3:docker :sdks:python:container:docker :sdks:java:container:docker :sdks:go:container:docker --info
This produces local images named
${USER}-docker-apache.bintray.io/beam/python3
${USER}-docker-apache.bintray.io/beam/python
${USER}-docker-apache.bintray.io/beam/java
${USER}-docker-apache.bintray.io/beam/go
You can set the repository root and tag when building with -Pdocker-repository-root= and -Pdocker-tag= on the gradle command line.
They can be examined by running docker commands:
$ docker image ls
$ docker run --entrypoint bash -it ${USER}-docker-apache.bintray.io/beam/python3:latest
Run a test against locally built container
Running dataflow tests with a custom container requires GCR and dataflow permissions in a cloud project and a GCS location. By default these are apache-beam-testing and gs://temp-storage-for-end-to-end-tests but these can be overridden with PROJECT and GCS_LOCATION environment variables.
$ bash sdks/python/container/run_validatescontainer.sh python2
$ bash sdks/python/container/run_validatescontainer.sh python3
$ ./gradlew :
javaPostCommitPortabilityApi --continue --info
TODO: how to run against the prebuilt container?
$ ./gradlew ... ?
Publish
Publishing an image to gcr.io/beam requires permissions in apache-beam-testing project.
This is the command that published an image X: $ ./gradlew ... ?
To publish an image Y to a custom repository run this command:
$ ./gradlew ... ?
Release Images Validation
This section describes how to validate a built and/or published image.
Automated test suites
We have these test suites in Beam that utilize portability:
- ...
- ...
To execute a test suite X against container Y run this command:
$ ./gradlew ... ?
Manual testing
To run a custom pipeline against an image
:
$ ./gradlew ... ?
Backwards compatibility
- Do we want new images to be able to run old pipelines?
- Vice-versa?
- How long do we support backwards compatibility for?
Other verification
- Do we sign the artifacts, images?
- Do we check hashes, signatures?
Support Story
How do we allow the images to be used?
- Do we support users using them in production as is?
- Is there some quota for downloading the prebuilt images?
What's the process of reporting issues with the built images?
Any special/extra license needs to be attached to the images?