-
-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect platform-specific dependency selected from universal requirements.txt using pip.parse #2690
Comments
So the solution is simple - swap the order of evaluation of the markers and the linked line so that we are first evaluating the markers and only then merging multiple requirements lines into one. So instead of doing:
We should evaluate the markers as we go. This won't be super efficient, but it will be correct. It will become much more efficient once I have time to finish #2629. Writing my thoughts here in case someone wants to take a stab. |
This implements the PEP508 compliant marker evaluation in starlark and removes the need for the Python interpreter when evaluating requirements files passed to `pip.parse`. This makes the evaluation faster and allows us to fix a few known issues (bazel-contrib#2690). In the future the intent is to move the `METADATA` parsing to pure starlark so that the `RequiresDist` could be parsed in starlark at the macro evaluation or analysis phases. This should make it possible to more easily solve the design problem that more and more things need to be passed to `whl_library` as args to have a robust dependency parsing: * bazel-contrib#2319 needs the full Python version to have correct cross-platform compatible METADATA parsing and passing it to `Python` and back makes it difficult/annoying to implement. * Parsing the `METADATA` file requires the precise list of target platform or the list of available packages in the `requirements.txt`. This means that without it we cannot trim the dependency tree in the `whl_library`. Doing this at macro loading phase allows us to depend on `.bzl` files in the `hub_repository` and more effectively pass information. Fixes bazel-contrib#2423
This implements the PEP508 compliant marker evaluation in starlark and removes the need for the Python interpreter when evaluating requirements files passed to `pip.parse`. This makes the evaluation faster and allows us to fix a few known issues (#2690). In the future the intent is to move the `METADATA` parsing to pure starlark so that the `RequiresDist` could be parsed in starlark at the macro evaluation or analysis phases. This should make it possible to more easily solve the design problem that more and more things need to be passed to `whl_library` as args to have a robust dependency parsing: * #2319 needs the full Python version to have correct cross-platform compatible `METADATA` parsing and passing it to `Python` and back makes it difficult/annoying to implement. * Parsing the `METADATA` file requires the precise list of target platform or the list of available packages in the `requirements.txt`. This means that without it we cannot trim the dependency tree in the `whl_library`. Doing this at macro loading phase allows us to depend on `.bzl` files in the `hub_repository` and more effectively pass information. I can remotely see that this could become useful in `py_wheel` or an building wheels from sdists as the environment markers may be present in various source metadata as well. What is more `uv.lock` file has the env markers as part of the lock file information, so this might be useful there. Work towards #2423 Work towards #260 Split from #2629
Since the marker evaluation is done entirely in starlark, this should not be too hard. Feel free submit a PR if your hands get to this faster. |
After spending some time on this, I started wondering why this comment is relevant
I think this should be handled by the user, i.e., aggregating extras. However, including the logic of selecting the longest extra makes it trickier to implement this because the extra is platform-dependent, so there is no way to select one or aggregate both. The logic now becomes IMHO, the solution is just to delete the sorting. |
ping @aignas |
The sorting and selecting the longest match roughly approximates how
would always work in a way where only the last entry survives. When I was modifying the code, I thought that in case there was a file:
We should select If I was solving this with the env markers I would first resolve the markers and then still do the sorting just to ensure that people with such requirements files do not have regressions. At least in the past such requirements files would be produced by |
That is correct. I think
In which just selecting the longest will be inaccurate. I will take a stab with this in mind. |
This change addresses a bug where `pip.parse` selects the wrong requirement entry when multiple extras are listed with platform-specific markers. #### π Problem: In a `requirements.txt` generated by tools like `uv` or `poetry`, it's valid to have multiple entries for the same package, each with different extras and `sys_platform` markers, for example: ```ini optimum[onnxruntime]==1.17.1 ; sys_platform == 'darwin' optimum[onnxruntime-gpu]==1.17.1 ; sys_platform == 'linux' ``` The current implementation in [`[parse_requirements.bzl](https://github.com/bazel-contrib/rules_python/blob/032f6aa738a673b13b605dabf55465c6fc1a56eb/python/private/pypi/parse_requirements.bzl#L114-L126)`](https://github.com/bazel-contrib/rules_python/blob/032f6aa738a673b13b605dabf55465c6fc1a56eb/python/private/pypi/parse_requirements.bzl#L114-L126) uses a sort-by-length heuristic to select the βbestβ requirement when there are multiple entries with the same base name. This works well in legacy `requirements.txt` files where: ``` my_dep my_dep[foo] my_dep[foo,bar] ``` ...would indicate an intent to select the **most specific subset of extras** (i.e. the longest name). However, this heuristic **breaks** in the presence of **platform markers**, where extras are **not subsets**, but distinct variants. In the example above, Bazel mistakenly selects `optimum[onnxruntime-gpu]` on macOS because it's a longer match, even though it is guarded by a Linux-only marker. #### β Fix: This PR modifies the behavior to: 1. **Add the requirement marker** as part of the sorting key. 2. **Then apply the longest-match logic** to drop duplicate requirements with different extras but the same markers. This ensures that only applicable requirements are considered during resolution, preserving correctness in multi-platform environments. #### π§ͺ Before: On macOS, the following entry is incorrectly selected: ``` optimum[onnxruntime-gpu]==1.17.1 ; sys_platform == 'linux' ``` #### β After: Correct entry is selected: ``` optimum[onnxruntime]==1.17.1 ; sys_platform == 'darwin' ``` close #2690 --------- Co-authored-by: Ignas Anikevicius <[email protected]>
π bug report
Affected Rule
pip.parse
Is this a regression?
Unknown β this is observed with the current behavior in the latest release.Description
I'm using a universal
requirements.txt
generated with uvI want to use platform-specific dependencies like this:
requirements.in
:requirements.txt
(generated):MODULE.bazel
:However, when building on macOS, Bazel attempts to use
optimum[onnxruntime-gpu]
instead of the appropriateoptimum[onnxruntime]
.It seems that the logic in the following file prioritizes the wrong requirement:
π
rules_python/python/private/pypi/parse_requirements.bzl
Lines 114 to 126 in 032f6aa
As a result, the build fails with:
π₯ Exception or Error
To determine the current config, I ran:
π¬ Minimal Reproduction
Repro steps:
requirements.in
as shown above.requirements.txt
withuv pip compile requirements.in
.pip.parse(requirements_by_platform=...)
inMODULE.bazel
.π Your Environment
Operating System:
Output of
bazel version
:Rules_python version:
Anything else relevant?
A workaround mentioned by @aignas in Slack is to split requirements into separate files for different platforms. That works, but it would be ideal if this could be handled correctly from a single universal requirements file.
Thanks!
The text was updated successfully, but these errors were encountered: