Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect platform-specific dependency selected from universal requirements.txt using pip.parse #2690

Closed
ewianda opened this issue Mar 21, 2025 · 7 comments Β· Fixed by #2766
Closed
Labels
type: pip pip/pypi integration

Comments

@ewianda
Copy link
Contributor

ewianda commented Mar 21, 2025

🐞 bug report

Affected Rule

pip.parse

Is this a regression?

Unknown β€” this is observed with the current behavior in the latest release.

Description

I'm using a universal requirements.txt generated with uv

I want to use platform-specific dependencies like this:

requirements.in:

optimum[onnxruntime]; sys_platform == 'darwin' and sys_platform != 'linux'
optimum[onnxruntime-gpu]; sys_platform == 'linux'

requirements.txt (generated):

optimum[onnxruntime]==1.17.1 ; python_full_version < '3.12' and platform_python_implementation == 'CPython' and sys_platform == 'darwin'
optimum[onnxruntime-gpu]==1.17.1 ; python_full_version < '3.12' and platform_python_implementation == 'CPython' and sys_platform == 'linux'

MODULE.bazel:

pip.parse(
    requirements_by_platform = {
        "//:requirements.txt": "linux_*,osx_*",
    },
)

However, when building on macOS, Bazel attempts to use optimum[onnxruntime-gpu] instead of the appropriate optimum[onnxruntime].

It seems that the logic in the following file prioritizes the wrong requirement:

πŸ”—

requirements_dict = {
(normalize_name(entry[0]), _extract_version(entry[1])): entry
for entry in sorted(
parse_result.requirements,
# Get the longest match and fallback to original WORKSPACE sorting,
# which should get us the entry with most extras.
#
# FIXME @aignas 2024-05-13: The correct behaviour might be to get an
# entry with all aggregated extras, but it is unclear if we
# should do this now.
key = lambda x: (len(x[1].partition("==")[0]), x),
)
}.values()

As a result, the build fails with:

πŸ”₯ Exception or Error


rules_python~~pip~py_deps/onnxruntime_gpu/BUILD.bazel:6:12: configurable attribute "actual" in @@rules_python~~pip~py_deps//onnxruntime_gpu:_no_matching_repository doesn't match this configuration: No matching wheel for current configuration's Python version.

The current build configuration's Python version doesn't match any of the Python
wheels available for this distribution. This distribution supports the following Python
configuration settings:
    //_config:is_cp311_cp311_manylinux_2_28_x86_64
    //_config:is_cp311_cp311_manylinux_2_31_x86_64
    //_config:is_cp311_cp311_manylinux_x86_64

To determine the current config, I ran:

bazel config <config_id>

πŸ”¬ Minimal Reproduction

Repro steps:

  1. Use a universal requirements.in as shown above.
  2. Generate requirements.txt with uv pip compile requirements.in.
  3. Use pip.parse(requirements_by_platform=...) in MODULE.bazel.
  4. Build on macOS.

🌍 Your Environment

Operating System:


macOS 14.x

Output of bazel version:


Bazel 7.x

Rules_python version:


1.2

Anything else relevant?

A workaround mentioned by @aignas in Slack is to split requirements into separate files for different platforms. That works, but it would be ideal if this could be handled correctly from a single universal requirements file.

Thanks!

@aignas aignas added the type: pip pip/pypi integration label Mar 21, 2025
@aignas
Copy link
Collaborator

aignas commented Mar 22, 2025

So the solution is simple - swap the order of evaluation of the markers and the linked line so that we are first evaluating the markers and only then merging multiple requirements lines into one.

So instead of doing:

  1. requirements_dict = {
    (normalize_name(entry[0]), _extract_version(entry[1])): entry
    for entry in sorted(
    parse_result.requirements,
    # Get the longest match and fallback to original WORKSPACE sorting,
    # which should get us the entry with most extras.
    #
    # FIXME @aignas 2024-05-13: The correct behaviour might be to get an
    # entry with all aggregated extras, but it is unclear if we
    # should do this now.
    key = lambda x: (len(x[1].partition("==")[0]), x),
    )
    }.values()
  2. env_marker_target_platforms = evaluate_markers(ctx, reqs_with_env_markers)

We should evaluate the markers as we go. This won't be super efficient, but it will be correct. It will become much more efficient once I have time to finish #2629.

Writing my thoughts here in case someone wants to take a stab.

aignas added a commit to aignas/rules_python that referenced this issue Mar 23, 2025
This implements the PEP508 compliant marker evaluation in starlark and
removes the need for the Python interpreter when evaluating requirements
files passed to `pip.parse`. This makes the evaluation faster and allows
us to fix a few known issues (bazel-contrib#2690).

In the future the intent is to move the `METADATA` parsing to pure
starlark so that the `RequiresDist` could be parsed in starlark at the
macro evaluation or analysis phases. This should make it possible to
more easily solve the design problem that more and more things need to
be passed to `whl_library` as args to have a robust dependency parsing:
* bazel-contrib#2319 needs the full Python version to have correct cross-platform
  compatible METADATA parsing and passing it to `Python` and back makes
  it difficult/annoying to implement.
* Parsing the `METADATA` file requires the precise list of target
  platform or the list of available packages in the `requirements.txt`.
  This means that without it we cannot trim the dependency tree in the
  `whl_library`. Doing this at macro loading phase allows us to depend
  on `.bzl` files in the `hub_repository` and more effectively pass
  information.

Fixes bazel-contrib#2423
github-merge-queue bot pushed a commit that referenced this issue Mar 30, 2025
This implements the PEP508 compliant marker evaluation in starlark and
removes the need for the Python interpreter when evaluating requirements
files passed to `pip.parse`. This makes the evaluation faster and allows
us to fix a few known issues (#2690).

In the future the intent is to move the `METADATA` parsing to pure
starlark so that the `RequiresDist` could be parsed in starlark at the
macro evaluation or analysis phases. This should make it possible to
more easily solve the design problem that more and more things need to
be passed to `whl_library` as args to have a robust dependency parsing:
* #2319 needs the full Python version to have correct cross-platform
compatible `METADATA` parsing and passing it to `Python` and back makes
  it difficult/annoying to implement.
* Parsing the `METADATA` file requires the precise list of target
  platform or the list of available packages in the `requirements.txt`.
  This means that without it we cannot trim the dependency tree in the
  `whl_library`. Doing this at macro loading phase allows us to depend
  on `.bzl` files in the `hub_repository` and more effectively pass
  information.

I can remotely see that this could become useful in `py_wheel` or an
building
wheels from sdists as the environment markers may be present in various
source
metadata as well. What is more `uv.lock` file has the env markers as
part of
the lock file information, so this might be useful there.

Work towards #2423
Work towards #260
Split from #2629
@aignas
Copy link
Collaborator

aignas commented Apr 1, 2025

Since the marker evaluation is done entirely in starlark, this should not be too hard. Feel free submit a PR if your hands get to this faster.

@ewianda
Copy link
Contributor Author

ewianda commented Apr 7, 2025

After spending some time on this, I started wondering why this comment is relevant

         # Get the longest match and fallback to original WORKSPACE sorting, 
         # which should get us the entry with most extras. 
         # 
         # FIXME @aignas 2024-05-13: The correct behaviour might be to get an 
         # entry with all aggregated extras, but it is unclear if we 
         # should do this now. 

I think this should be handled by the user, i.e., aggregating extras.
requirements.txt should not have two lines with different extras for the same platform

However, including the logic of selecting the longest extra makes it trickier to implement this because the extra is platform-dependent, so there is no way to select one or aggregate both. The logic now becomes select longest if the same platform or keep both if different platforms.

IMHO, the solution is just to delete the sorting.

@ewianda
Copy link
Contributor Author

ewianda commented Apr 9, 2025

ping @aignas

@aignas
Copy link
Collaborator

aignas commented Apr 9, 2025

The sorting and selecting the longest match roughly approximates how WORKSPACE would work in the past where a requirements file like:

my_dep ...hashes...
my_dep[foo] ...hashes...

would always work in a way where only the last entry survives.

When I was modifying the code, I thought that in case there was a file:

my_dep
my_dep[foo]
my_dep[bar]
my_dep[foo,bar]

We should select my_dep[foo,bar].

If I was solving this with the env markers I would first resolve the markers and then still do the sorting just to ensure that people with such requirements files do not have regressions. At least in the past such requirements files would be produced by poetry and rules_poetry used to break just because of rules_python not handling such requirements files.

@ewianda
Copy link
Contributor Author

ewianda commented Apr 9, 2025

That is correct. I think optimum is different since the extras are completely different and not just a subset. Another case (probably hypothetical) is say

my_dep[bar]; sys_platform == 'linux'
my_dep[foo,bar]; sys_platform == 'darwin'

In which just selecting the longest will be inaccurate.

I will take a stab with this in mind.

@ewianda
Copy link
Contributor Author

ewianda commented Apr 10, 2025

@aignas I came up with #2766

github-merge-queue bot pushed a commit that referenced this issue Apr 11, 2025
This change addresses a bug where `pip.parse` selects the wrong
requirement entry when multiple extras are listed with platform-specific
markers.

#### πŸ” Problem:
In a `requirements.txt` generated by tools like `uv` or `poetry`, it's
valid to have multiple entries for the same package, each with different
extras and `sys_platform` markers, for example:

```ini
optimum[onnxruntime]==1.17.1 ; sys_platform == 'darwin'
optimum[onnxruntime-gpu]==1.17.1 ; sys_platform == 'linux'
```

The current implementation in
[`[parse_requirements.bzl](https://github.com/bazel-contrib/rules_python/blob/032f6aa738a673b13b605dabf55465c6fc1a56eb/python/private/pypi/parse_requirements.bzl#L114-L126)`](https://github.com/bazel-contrib/rules_python/blob/032f6aa738a673b13b605dabf55465c6fc1a56eb/python/private/pypi/parse_requirements.bzl#L114-L126)
uses a sort-by-length heuristic to select the β€œbest” requirement when
there are multiple entries with the same base name. This works well in
legacy `requirements.txt` files where:
```
my_dep
my_dep[foo]
my_dep[foo,bar]
```
...would indicate an intent to select the **most specific subset of
extras** (i.e. the longest name).

However, this heuristic **breaks** in the presence of **platform
markers**, where extras are **not subsets**, but distinct variants. In
the example above, Bazel mistakenly selects `optimum[onnxruntime-gpu]`
on macOS because it's a longer match, even though it is guarded by a
Linux-only marker.

#### βœ… Fix:
This PR modifies the behavior to:
1. **Add the requirement marker** as part of the sorting key.
2. **Then apply the longest-match logic** to drop duplicate requirements
with different extras but the same markers.

This ensures that only applicable requirements are considered during
resolution, preserving correctness in multi-platform environments.

#### πŸ§ͺ Before:
On macOS, the following entry is incorrectly selected:
```
optimum[onnxruntime-gpu]==1.17.1 ; sys_platform == 'linux'
```

#### βœ… After:
Correct entry is selected:
```
optimum[onnxruntime]==1.17.1 ; sys_platform == 'darwin'
```

close #2690

---------

Co-authored-by: Ignas Anikevicius <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: pip pip/pypi integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants