Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Use the following code:

    Code Block
    languagebash
    # Initialize virtual environment called "env" in ~/.virtualenvs or any other directory. (Consider using pyenv, to manage the python version as well as installed packages in your virtual environment)
    $ python3 -m venv ~/.virtualenvs/env
    
    # Activate virtual environment.
    $ . ~/.virtualenvs/env/bin/activate
    
    # Upgrade other tools. (Optional)
    pip install --upgrade pip
    pip install --upgrade setuptools
    
    # Install setup.py requirements.
    (env) $ pip install -r build-requirements.txt
    
    # Install Apache Beam package in editable mode.
    (env) $ pip install -e .[gcp,test]
    
    

    For certain systems, particularly Macs with M1 chips, this installation method may not generate urns correctly. If running python gen_protos.py  doesn't resolve the issue, consult https://github.com/apache/beam/issues/22742#issuecomment-1218216468 for further guidance.

...

Code Block
languagebash
cd sdks/python/
python setup.py pip install build && python -m build --sdist

We will use the tarball built by this command in the --sdk_location parameter.

...

Code Block
languagebash
# Build portable worker
./gradlew :runners:google-cloud-dataflow-java:worker:build -x spotlessJava -x rat -x test
./gradlew :runners:google-cloud-dataflow-java:worker:shadowJar

# Build portable Pyhon SDK harness and publish it to GCP
./gradlew -Pdocker-repository-root=gcr.io/dataflow-build/$USER/beam -p sdks/python/container docker
gcloud docker -- push gcr.io/dataflow-build/$USER/beam/python:latest

# Initialize python
cd sdks/python
virtualenv env
. ./env/bin/activate

# run pipeline
python -m apache_beam.examples.wordcount   --runner DataflowRunner   --num_workers 1   --project <gcp_project_name>   --output <gs://path>   --temp_location <gs://path>   --workersdk_harness_container_image gcr.io/dataflow-build/$USER/beam/python:latest   --experiment beam_fn_api   --sdk_location build/apache-beam-2.12.0.dev0.tar.gz  --debug

...

  1. Click on a recent `Build python source distribution and wheels job` that ran successfully on the github.com/apache/beam master branch from this list
  2. Click on List files on Google Cloud Storage Bucket on the right-side panel.
  3. Expand List file on Google Cloud Storage Bucket in the main panel.
  4. Locate and Download the ZIP file. For example, apache-beam-2.4852.0.dev0.tar.zip from gz from GCS.
  5. Install the downloaded zip file. For example:

    Code Block
    languagebash
    titleSimpleTest
    pip install apache-beam-2.4852.0.dev0.tar.zipgz
    # Or, if you need extra dependencies:
    pip install apache-beam-2.4852.0.dev0.ziptar.gz[aws,gcp]


  6. When you run your Beam pipeline, pass in the --sdk_location flag pointed at the same ZIP file. 


    Code Block
    languagebash
    titleSimpleTest
    --sdk_location=apache-beam-2.2552.0.dev0.ziptar.gz


How to update dependencies that are installed in Python container images 

...