How to update dependency versions?

    • We expect that the majority of upgrades will be done via merging of automated Dependabot PRs. 
    • When reviewing a dependency update PR, note that increasing an upper bound may not be sufficient to exercise the new version in integration test suites. Committers can temporarily include a commit to a PR that increases the lower for a dependency, run integration tests, once they pass, remove the lower bound constraint. Sample PR that followed this process: https://github.com/apache/beam/pull/25786/commits. See an example of a PR review that followed this process, and identified a breaking change: https://github.com/apache/beam/pull/17615.

When to update the versions?

    • Depending on old versions of our dependencies is an inconvenience to users and can be a ticking time bomb (https://s.apache.org/beam-python-dependencies-pm).
    • Be proactive: update early and escalate any issues downstream as early as possible. Most dependabot PRs should be merged within a week. Complex upgrades like supporting a new major version of a commonly used library (for example protobuf), and may need to be completed across the ecosystem of packages. 
    • For upgrades that require a significant amount of work, Beam maintainers should plan to complete the upgrade within a year after the next (major) version has been first released. The sooner, the better. 

How to add a new dependency?

    • Set the lower bound to some version you tested. Often, it's the latest available version for the package. 
    • For libraries that claim to follow semantic versioning, cap the upper bound at the next major version. For example: "some_package>=1.4.0, some_package<2"
    • Depending on an exact version or a very narrow range is warranted only in exceptional cases, for example: pickling libraries ()
    • Using less-or-equal sign in upper bound is wrong (e.g. "some_pacakge<=2.0.0"): this caps the upper bound to a specific version (2.0), excluding a possibility to make a patch release that the constraint will pick up. 
    • For stable dependencies (only bugfix releases, stable api surface), open version bounds are acceptable. Example: 'pytz>=2018.3'.
    • For other dependencies, use upper bounds at the next minor version or decide case by case. Example: 'numpy>=1.14.3,<1.25.0', https://github.com/apache/beam/blob/818c2b44e998529f3e5727a5d30b75922e0d113d/sdks/python/setup.py#L248-L250  
    • When a rationale behind a requirement spec is not obvious, explain in a comment. 

Should transitive dependencies be included?

    • Don't manage more than necessary. Do not add constraints on transitive dependencies that are not direct dependencies.
    • If Beam directly uses a transitive dependency, Beam should also directly depend on it (include it in constraints).
    • If a transitive dependency causes issues, add it to our requirements with an appropriate upper bound and comment when such a requirement can be removed.

What to do when installing Beam causes backtracking? 

    • Installing Apache Beam should not require backtracking during dependency resolution: after pip evaluates the set of constraints, each package should be downloaded once. 
    • This is generally possible when using the latest allowed version of each dependency leads to a compatible configuration.
    • When backtracking happens, it can be prevented by adding a constraint that caps the allowed version of a dependency to the last compatible version.

How to find which dependencies are outdated? 

    • Update Beam's base image requirements (recommended).
    • Install Beam dependencies into a clean environment:  `pip install -r sdks/python/container/py310/base_image_requirements.txt` 
    • Check for outdated dependencies: `pip list --outdated`.
    • Ideally, each outdated dependency from our direct dependency list should either have a Dependabot PR in flight or an issue tracking the upgrade. 
  • No labels