...
- You can download, build and install CPython from sources.
- If you are an Ubuntu user, you could add a third-party repository 'Deadsnakes' and install the missing versions via apt. If you install from Deadsnakes, make sure to also install
python#.#-dev
,python#.#-venv
andpython#
.#-distutils
packages. - You can use PyEnv to download and install Python versions (Recommended).
Installation steps may look as follows:- Follow the steps below in How to setup pyenv.
Install Python intepreter for each supported Python minor version. For example:
Code Block language bash pyenv install 3.78.109 pyenv install 3.89.94 pyenv install 3.910.4 7 pyenv install 3.11.3
For major.minor.patch versions currently used by Jenkins cluster, see Current Installations.
Make installed interpreters available in your shell by running
Code Block language bash pyenv global 3.8.9 3.9.74 3.10.7 3.911.43
(OPTIONAL) Pyenv will sometimes fail to make these interpreters directly available without a local configuration. If you see errors trying to use
python3.x
, then run alsopyenv local
Code Block language bash pyenv local 3.8.9 3.9.74 3.10.7 3.911.43
After these steps, all python3.x
interpreters should be available in your shell. The first version in the list passed to pyenv global
will be used as default python / python3
interpreter if the minor version is not specified.
...
Use the following code:
Code Block language bash # Initialize virtual environment called "env" in ~/.virtualenvs or any other directory. (Consider using pyenv, to manage the python version as well as installed packages in your virtual environment) $ python3 -m venv ~/.virtualenvs/env # Activate virtual environment. $ . ~/.virtualenvs/env/bin/activate # Upgrade other tools. (Optional) pip install --upgrade pip pip install --upgrade setuptools # Install setup.py requirementsApache Beam package in editable mode. (env) $ pip install -re build-requirements.txt # Install Apache Beam package in editable mode. (env) $ pip install -e .[gcp,test]
On Windows
.[gcp,test]
For certain systems, particularly Macs with M1 chips, this installation method may not generate urns correctly. If running
python gen_protos.py
doesn't resolve the issue, consult https://github.com/apache/beam/issues/22742#issuecomment-1218216468 for further guidance.
On Windows
Use the following code:
Code Block Use the following code:
Code Block language bash > c:\Python37\python.exe -m venv c:\path\to\env > c:\path\to\env\Scripts\activate.bat # Powershell users should run instead: > c:\path\to\env\Scripts\activate.ps1 (env) > pip install -e .[gcp,test]
You can deactivate the
virtualenv
when done.Code Block language bash (env) $ deactivate
...
How to setup pyenv
(with pyenv-virtualenv plugin)
Install prerequisites for your distribution.
- curl
https://pyenv.run | bash
- Add the required lines to
~/.bashrc
(as returned by the script). - Note (12/10/2021): You may have to manually modify .bashrc as described here: https://github.com/pyenv/pyenv-installer/issues/112#issuecomment-971964711. Remove this note if no longer applicable.
- Open a new shell. If
pyenv
command is still not available in PATH, you may need to restart the login session.
...
Code Block | ||
---|---|---|
| ||
# Install pyenv deps sudo apt-get install -y build-essential libssl-dev zlib1g-dev libbz2-dev \ libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev \ xz-utils tk-dev libffi-dev liblzma-dev python3-openssl git # Install pyenv, and pyenv-virtualenv plugin curl https://pyenv.run | bash # Run the outputted commands to initialize pyenv in .bashrc |
Example: How to Run Unit Tests with PyCharm Using Python 3.
...
8.9 in a virtualenv
- Install Python 3.78.9 and create a
virtualenv
pyenv
install 3.78.9pyenv
virtualenv
3.78.9ENV_NAME
pyenv
activateENV_NAME
Upgrade packages (recommended)
Code Block pip install --upgrade pip setuptools
- Set up PyCharm
- Start by adding a new project interpreter (from the bottom right or in Settings).
- Select Existing environment and the interpreter, which should be under
~/.pyenv/versions/3.78.9/envs/ENV_NAME/bin/python
or~/.pyenv/versions/ENV_NAME/bin/python
. - Switch interpreters at the bottom right.
...
Code Block | ||
---|---|---|
| ||
cd sdks/python/ python setup.py pip install build && python -m build --sdist |
We will use the tarball built by this command in the --sdk_location
parameter.
...
Code Block | ||
---|---|---|
| ||
(env) $ pip install tox (env) $ tox -c tox.ini tox run -e py38-cloud # all tests (env) $ tox -c tox.ini run -e py38 -- -k test_progress |
...
Code Block | ||
---|---|---|
| ||
# Build portable worker ./gradlew :runners:google-cloud-dataflow-java:worker:build -x spotlessJava -x rat -x test ./gradlew :runners:google-cloud-dataflow-java:worker:shadowJar # Build portable Pyhon SDK harness and publish it to GCP ./gradlew -Pdocker-repository-root=gcr.io/dataflow-build/$USER/beam -p sdks/python/container docker gcloud docker -- push gcr.io/dataflow-build/$USER/beam/python:latest # Initialize python cd sdks/python virtualenv env . ./env/bin/activate # run pipeline python -m apache_beam.examples.wordcount --runner DataflowRunner --num_workers 1 --project <gcp_project_name> --output <gs://path> --temp_location <gs://path> --workersdk_harness_container_image gcr.io/dataflow-build/$USER/beam/python:latest --experiment beam_fn_api --sdk_location build/apache-beam-2.12.0.dev0.tar.gz --debug |
...
- Click on a recent `Build python source distribution and wheels job` that ran successfully on the github.com/apache/beam master branch from this list.
- Click on List files on Google Cloud Storage Bucket on the right-side panel.
- Expand List file on Google Cloud Storage Bucket in the main panel.
- Locate and Download the ZIP file. For example,
apache-beam-2.2552.0.dev0.zip
fromtar.gz
from GCS. - It’s simplest to download the file using your browser by replacing the prefix “gs://” with “https://storage.googleapis.com/” . For example, https://storage.googleapis.com/beam-wheels-staging/master/02bf081d0e86f16395af415cebee2812620aff4b-207975627/apache-beam-2.25.0.dev0.zip
- Or follow these instructions to download using the
gsutil
command-line tool. Install the downloaded zip file. For example:
Code Block language bash title SimpleTest pip install apache-beam-2.2552.0.dev0.ziptar.gz # Or, if you need extra dependencies: pip install apache-beam-2.2552.0.dev0.ziptar.gz[aws,gcp]
- When you run your Beam pipeline, pass in the
--sdk_location
flag pointed at the same ZIP file.Code Block language bash title SimpleTest --sdk_location=apache-beam-2.2552.0.dev0.ziptar.gz
How to update dependencies that are installed in Python container images
When we build Python container images for Apache Beam SDK, we install PyPI packages of Apache Beam and some additional PyPi dependencies that will likely benefit users. The complete list of dependencies is specified in base_image_requirements.txt files, for each Python minor version. These files are generated from Beam SDK requirements, specified in setup.py, and a short list of additional dependencies specified in base_image_requirements_manual.txt.
We expect all Beam dependencies (including transitive dependencies, and deps for some of the 'extra's, like [gcp]) to be specified with exact versions in the requirements files. Therefore, you may need to regenerate the requirements files when you modify Python SDKs dependencies in setup.py.
Regenerate the requirements files by running: ./gradlew :sdks:python:container:generatePythonRequirementsAll
and commiting the changes. Exectution takes about ~5 min per Python version and is somewhat resource-demanding. You can also regenerate the dependencies individually per version with targets like ./gradlew :sdks:python:container:py38:generatePythonRequirements
.
To run the command successfully, you will need Python interpreters for all versions supported by Beam. See: Installing Python Interpreters.
NOTE for RELEASE MANAGERS: The updated Python dependency files must be merged into Beam's master
branch before cutting the release branch.
Errors
You may see the following error with a particular python version like Python 3.6.
...
specified in base_image_requirements.txt files, for each Python minor version. These files are generated from Beam SDK requirements, specified in setup.py, and a short list of additional dependencies specified in base_image_requirements_manual.txt.
We expect all Beam dependencies (including transitive dependencies, and deps for some of the 'extra's, like [gcp]) to be specified with exact versions in the requirements files. Therefore, you may need to regenerate the requirements files when you modify Python SDKs dependencies in setup.py.
Regenerate the requirements files by running: ./gradlew :sdks:python:container:generatePythonRequirementsAll
and commiting the changes. Execution can take up to 5 min per Python version and is somewhat resource-demanding. You can also regenerate the dependencies individually per version with targets like ./gradlew :sdks:python:container:py38:generatePythonRequirements
.
To run the command successfully, you will need Python interpreters for all versions supported by Beam. See: Installing Python Interpreters.
NOTE for RELEASE MANAGERS: The updated Python dependency files must be merged into Beam's master
branch before cutting the release branch.
Errors
You may see that the pip command will lead to segmentation fault as well. If this happens, remove the python version from pyenv, and reinstall the version like this.
Code Block |
---|
CFLAGS="-O2" pyenv install 3.68.129 |
There have been issues with older Python versions. See here for details.