Skip to content

Allow using environment for configuring toolchains and pypi with bzlmod #3293

@rickeylev

Description

@rickeylev

Over in jax (and some related projects, like xla), they're currently using workspace and have a pretty bespoke way of configuring their toolchains and pip settings.

They use environment variables to specify the python version, url, sha, and threading. Those generate some repos with the values, which eventually decide/feed into the python_repository/python_register_toolchains/pip_parse rules. The net effect is they have one python version for the whole build, but are able to change it without modifying workspace. CI jobs and users can then set the values to change what python is used (both for the toolchain and for pip). Thus, they're able to have something resembling multi-version support. The basic logic of their WORKSPACE is something like:

load("env.bzl", "env")
env() # reads env vars, generates '@env'
load("@env//...", "HERMETIC_...")
if url:
  python_register_toolchains(name="python", TOOL_VERSIONS=<env url>)
else:
  python_register_toolchains(name="python", <env version>>)
pip_parse(version=<env var>, interpreter=@python//:interpreter)

Something this env var setup allows that our builtin multi-version setup doesn't is allowing the user to easily specify an alternative python runtime. e.g. they simply do HERMETIC_PYTHON_VERSION=3.15, HERMETIC_PYTHON_URL=file:///cpython-3.15.tar.gz, and then its used for everything, including by pip_parse.

I think there's two basic needs this is trying to serve:
1.. Allowing easily overriding the runtime. This allows custom building Python (at head, with santizers, etc) and using it.
2. pip_parse can be sensitive to the runtime used. This design helps ensure the right interpreter is used. In particular, if a freethreading interpreter is used.

For (1), local toolchain rules should be able to handle this, mostly. The problem I see is several bzlmod APIs want a string literal for python version, but with such a toolchain, we don't know the version until its run.

For (2), pip.parse can use python_interpreter_target to point to a local runtime. However, (1) python_version is required, which we don't know, and (2) pip.parse is particular about duplicate calls.

Sketching a MODULE.bazel, I came up with this:

local_runtime = use_repo_rule(...)
local_runtime(name="local_runtime", path="python3")
local_toolchain = use_repo_rule(...)
local_toolchain(name="local_toolchains", repos=["local_runtime"], TCW=<//:py=local>)
register_toolchains("@local_toolchains//...")

pip = use_extension(...)

pip.parse(
  python_version = ???
  python_interpreter_target = "@local_runtime//:python3")
  requirements = "//:requirements-local.txt",
  config_settings = ???
)

# Run
export PATH=$PATH:/cpython-src/build/python3.15
bazel build --@//:py=local //:foo

The python_version and config_settings part for pip.parse is unclear.

Maybe add python_version_target="@local_runtime//:version.txt" ? If we could get rid of the python_version attribute entirely that might be better. Is it actually required (during bzlmod/repo phase) if the interpreter target is given explicitly?

I'm not sure what config_settings would be for pip.parse. Maybe just match what is set on the toolchain?

Some misc improvments to local toolchain that might help:

  • Allow local toolchains to get python from a particular envvar. (modifying PATH seems invasive, potentially expensive)
  • Generate a bzl file with the detected python version. There's some various contexts where a loading-phase string is needed of the python version (py_wheel, py_binary.python_version, among others). These should probably be updated to accept a label for the python version, where feasible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions