You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

This page describes the release, validation, and other aspects related to SDKHarness images.


Overview

The idea is to build a set of public SDKHarness pre-built images, that users can utilize to run their portable pipelines without having to manually build them, or use these images as base images for customization.

Background information

  • SDKHarness architecture / design docs ?
  • Image structure ?
  • Source ? 

Location/Naming

This section describes the naming scheme and location for publication of the containers.

Proposed repository:

gcr.io/beam, created under apache-beam-testing project, artifacts accessible publicly. SDK Harness containers go to sdk folder.

Proposed naming and tagging scheme:


NamingTaggingExample
Snapshot imagelanguage + language_versionyyyymmdd-hhmmss in UTCgcr.io/apache-beam-testing/beam/sdk/java8:20190820-020000
Release imageslanguage + language_versionBeam release versiongcr.io/apache-beam-testing/beam/sdk/python2.7:2.10.1


Gcr.io

Things to know about gcr.io:

  • Quotas:
    • ?
  • Permissions:
    • download access - public
    • publish - limited to authorized accounts that have correct permissions under apache-beam-testing:
      • publishing the snapshots nightly might be feasible similar to how we currently publish nightly maven snapshots, by creating a Jenkins job;
      • publishing at release time can be another Jenkins job that is triggered manually by the release owner;
        • what's the process of triggering a job in such case?
  • GCP Project:
    • apache-beam-testing
  • Troubleshooting:
    • If something goes wrong with the release process:
      • ping dev@ ?
    • If something goes wrong with a customer pipeline from using the prebuilt images:
      • ping dev@ ?

Publication Schedule

Snapshots

  • how do we publish the snapshots, what's the frequency?
    • automatic, when HEAD is built, nightly similar to maven snapshots, or manually?
  • when and how do we cleanup the snapshot images:
    • when you publish an image you have a new version/hash and still can use any previous versions. They take space and will count towards a quota. We need to clean them up periodically:
      • make it part of the publish job to look and delete the versions that are more than X days (or versions) old?

Release Images

  • how should we build and publish the images for release versions of Beam?
    • manually, as part of Beam release;
    • or by automatically triggering a job at some step of the release?
  • should it be a blocker for the release?
    • should we make it part of the release and not mark release as complete until the images are published?
    • or can we publish them in a separate process later/earlier?
    • should it be done by the same release owner?
    • should the validation be part of the release validation?

Commands

This section describes the commands that are used to build, publish, run tests and examples for the images.


Prerequisites

  • docker

Build

To build docker images run:

  $ ./gradlew :sdks:python:container:py3:docker :sdks:python:container:docker :sdks:java:container:docker :sdks:go:container:docker --info

This produces local images named

     ${USER}-docker-apache.bintray.io/beam/python3
     ${USER}-docker-apache.bintray.io/beam/python
     ${USER}-docker-apache.bintray.io/beam/java
     ${USER}-docker-apache.bintray.io/beam/go

You can set the repository root and tag when building with -Pdocker-repository-root= and -Pdocker-tag= on the gradle command line.

They can be examined by running docker commands:

  $ docker image ls
  $ docker run --entrypoint bash -it ${USER}-docker-apache.bintray.io/beam/python3:latest

Run a test against locally built container

Running dataflow tests with a custom container requires GCR and dataflow permissions in a cloud project and a GCS location. By default these are apache-beam-testing and gs://temp-storage-for-end-to-end-tests but these can be overridden with PROJECT and GCS_LOCATION environment variables.

  $ bash sdks/python/container/run_validatescontainer.sh python2

  $ bash sdks/python/container/run_validatescontainer.sh python3

  $ ./gradlew :javaPostCommitPortabilityApi --continue --info



TODO: how to run against the prebuilt container?

  $ ./gradlew ... ?

Publish

Publishing an image to gcr.io/beam requires permissions in apache-beam-testing project.

This is the command that published an image X:
  $ ./gradlew ... ?

To publish an image Y to a custom repository run this command:

  $ ./gradlew ... ?

Release Images Validation

This section describes how to validate a built and/or published image.

Automated test suites

We have these test suites in Beam that utilize portability:

  • ...
  • ...

To execute a test suite X against container Y run this command:

  $ ./gradlew ... ?

Manual testing

To run a custom pipeline (question) against an image (question):

  $ ./gradlew ... ?

Backwards compatibility

  • Do we want new images to be able to run old pipelines?
  • Vice-versa?
  • How long do we support backwards compatibility for?

Other verification

  • Do we sign the artifacts, images?
  • Do we check hashes, signatures?

Support Story

How do we allow the images to be used?

  • Do we support users using them in production as is?
  • Is there some quota for downloading the prebuilt images?

What's the process of reporting issues with the built images?

Any special/extra license needs to be attached to the images?











  • No labels