Incorrect platform-specific dependency selected from universal requirements.txt using pip.parse #2690

ewianda · 2025-03-21T04:36:26Z

🐞 bug report

Affected Rule

pip.parse

Is this a regression?

Unknown — this is observed with the current behavior in the latest release.

Description

I'm using a universal requirements.txt generated with uv

I want to use platform-specific dependencies like this:

`requirements.in`:

optimum[onnxruntime]; sys_platform == 'darwin' and sys_platform != 'linux'
optimum[onnxruntime-gpu]; sys_platform == 'linux'

`requirements.txt` (generated):

optimum[onnxruntime]==1.17.1 ; python_full_version < '3.12' and platform_python_implementation == 'CPython' and sys_platform == 'darwin'
optimum[onnxruntime-gpu]==1.17.1 ; python_full_version < '3.12' and platform_python_implementation == 'CPython' and sys_platform == 'linux'

`MODULE.bazel`:

pip.parse(
    requirements_by_platform = {
        "//:requirements.txt": "linux_*,osx_*",
    },
)

However, when building on macOS, Bazel attempts to use optimum[onnxruntime-gpu] instead of the appropriate optimum[onnxruntime].

It seems that the logic in the following file prioritizes the wrong requirement:

🔗

rules_python/python/private/pypi/parse_requirements.bzl

Lines 114 to 126 in 032f6aa

    
           requirements_dict = { 
        
               (normalize_name(entry[0]), _extract_version(entry[1])): entry 
        
               for entry in sorted( 
        
                   parse_result.requirements, 
        
                   # Get the longest match and fallback to original WORKSPACE sorting, 
        
                   # which should get us the entry with most extras. 
        
                   # 
        
                   # FIXME @aignas 2024-05-13: The correct behaviour might be to get an 
        
                   # entry with all aggregated extras, but it is unclear if we 
        
                   # should do this now. 
        
                   key = lambda x: (len(x[1].partition("==")[0]), x), 
        
               ) 
        
           }.values()

As a result, the build fails with:

🔥 Exception or Error


rules_python~~pip~py_deps/onnxruntime_gpu/BUILD.bazel:6:12: configurable attribute "actual" in @@rules_python~~pip~py_deps//onnxruntime_gpu:_no_matching_repository doesn't match this configuration: No matching wheel for current configuration's Python version.

The current build configuration's Python version doesn't match any of the Python
wheels available for this distribution. This distribution supports the following Python
configuration settings:
    //_config:is_cp311_cp311_manylinux_2_28_x86_64
    //_config:is_cp311_cp311_manylinux_2_31_x86_64
    //_config:is_cp311_cp311_manylinux_x86_64

To determine the current config, I ran:

bazel config <config_id>

🔬 Minimal Reproduction

Repro steps:

Use a universal requirements.in as shown above.
Generate requirements.txt with uv pip compile requirements.in.
Use pip.parse(requirements_by_platform=...) in MODULE.bazel.
Build on macOS.

🌍 Your Environment

Operating System:


macOS 14.x

Output of bazel version:


Bazel 7.x

Rules_python version:

1.2

Anything else relevant?

A workaround mentioned by @aignas in Slack is to split requirements into separate files for different platforms. That works, but it would be ideal if this could be handled correctly from a single universal requirements file.

Thanks!

The text was updated successfully, but these errors were encountered:

aignas · 2025-03-22T02:56:28Z

So the solution is simple - swap the order of evaluation of the markers and the linked line so that we are first evaluating the markers and only then merging multiple requirements lines into one.

So instead of doing:

rules_python/python/private/pypi/parse_requirements.bzl

Lines 114 to 126 in 032f6aa

    
           requirements_dict = { 
        
               (normalize_name(entry[0]), _extract_version(entry[1])): entry 
        
               for entry in sorted( 
        
                   parse_result.requirements, 
        
                   # Get the longest match and fallback to original WORKSPACE sorting, 
        
                   # which should get us the entry with most extras. 
        
                   # 
        
                   # FIXME @aignas 2024-05-13: The correct behaviour might be to get an 
        
                   # entry with all aggregated extras, but it is unclear if we 
        
                   # should do this now. 
        
                   key = lambda x: (len(x[1].partition("==")[0]), x), 
        
               ) 
        
           }.values()

rules_python/python/private/pypi/parse_requirements.bzl

Line 171 in 032f6aa

env_marker_target_platforms = evaluate_markers(ctx, reqs_with_env_markers)

We should evaluate the markers as we go. This won't be super efficient, but it will be correct. It will become much more efficient once I have time to finish #2629.

Writing my thoughts here in case someone wants to take a stab.

This implements the PEP508 compliant marker evaluation in starlark and removes the need for the Python interpreter when evaluating requirements files passed to `pip.parse`. This makes the evaluation faster and allows us to fix a few known issues (bazel-contrib#2690). In the future the intent is to move the `METADATA` parsing to pure starlark so that the `RequiresDist` could be parsed in starlark at the macro evaluation or analysis phases. This should make it possible to more easily solve the design problem that more and more things need to be passed to `whl_library` as args to have a robust dependency parsing: * bazel-contrib#2319 needs the full Python version to have correct cross-platform compatible METADATA parsing and passing it to `Python` and back makes it difficult/annoying to implement. * Parsing the `METADATA` file requires the precise list of target platform or the list of available packages in the `requirements.txt`. This means that without it we cannot trim the dependency tree in the `whl_library`. Doing this at macro loading phase allows us to depend on `.bzl` files in the `hub_repository` and more effectively pass information. Fixes bazel-contrib#2423

This implements the PEP508 compliant marker evaluation in starlark and removes the need for the Python interpreter when evaluating requirements files passed to `pip.parse`. This makes the evaluation faster and allows us to fix a few known issues (#2690). In the future the intent is to move the `METADATA` parsing to pure starlark so that the `RequiresDist` could be parsed in starlark at the macro evaluation or analysis phases. This should make it possible to more easily solve the design problem that more and more things need to be passed to `whl_library` as args to have a robust dependency parsing: * #2319 needs the full Python version to have correct cross-platform compatible `METADATA` parsing and passing it to `Python` and back makes it difficult/annoying to implement. * Parsing the `METADATA` file requires the precise list of target platform or the list of available packages in the `requirements.txt`. This means that without it we cannot trim the dependency tree in the `whl_library`. Doing this at macro loading phase allows us to depend on `.bzl` files in the `hub_repository` and more effectively pass information. I can remotely see that this could become useful in `py_wheel` or an building wheels from sdists as the environment markers may be present in various source metadata as well. What is more `uv.lock` file has the env markers as part of the lock file information, so this might be useful there. Work towards #2423 Work towards #260 Split from #2629

aignas · 2025-04-01T14:40:52Z

Since the marker evaluation is done entirely in starlark, this should not be too hard. Feel free submit a PR if your hands get to this faster.

ewianda · 2025-04-07T06:15:39Z

After spending some time on this, I started wondering why this comment is relevant

         # Get the longest match and fallback to original WORKSPACE sorting, 
         # which should get us the entry with most extras. 
         # 
         # FIXME @aignas 2024-05-13: The correct behaviour might be to get an 
         # entry with all aggregated extras, but it is unclear if we 
         # should do this now.

I think this should be handled by the user, i.e., aggregating extras.
requirements.txt should not have two lines with different extras for the same platform

However, including the logic of selecting the longest extra makes it trickier to implement this because the extra is platform-dependent, so there is no way to select one or aggregate both. The logic now becomes select longest if the same platform or keep both if different platforms.

IMHO, the solution is just to delete the sorting.

ewianda · 2025-04-09T11:47:58Z

ping @aignas

aignas · 2025-04-09T12:43:30Z

The sorting and selecting the longest match roughly approximates how WORKSPACE would work in the past where a requirements file like:

my_dep ...hashes...
my_dep[foo] ...hashes...

would always work in a way where only the last entry survives.

When I was modifying the code, I thought that in case there was a file:

my_dep
my_dep[foo]
my_dep[bar]
my_dep[foo,bar]

We should select my_dep[foo,bar].

If I was solving this with the env markers I would first resolve the markers and then still do the sorting just to ensure that people with such requirements files do not have regressions. At least in the past such requirements files would be produced by poetry and rules_poetry used to break just because of rules_python not handling such requirements files.

ewianda · 2025-04-09T15:08:31Z

That is correct. I think optimum is different since the extras are completely different and not just a subset. Another case (probably hypothetical) is say

my_dep[bar]; sys_platform == 'linux'
my_dep[foo,bar]; sys_platform == 'darwin'

In which just selecting the longest will be inaccurate.

I will take a stab with this in mind.

ewianda · 2025-04-10T06:23:35Z

@aignas I came up with #2766

This change addresses a bug where `pip.parse` selects the wrong requirement entry when multiple extras are listed with platform-specific markers. #### 🔍 Problem: In a `requirements.txt` generated by tools like `uv` or `poetry`, it's valid to have multiple entries for the same package, each with different extras and `sys_platform` markers, for example: ```ini optimum[onnxruntime]==1.17.1 ; sys_platform == 'darwin' optimum[onnxruntime-gpu]==1.17.1 ; sys_platform == 'linux' ``` The current implementation in [`[parse_requirements.bzl](https://github.com/bazel-contrib/rules_python/blob/032f6aa738a673b13b605dabf55465c6fc1a56eb/python/private/pypi/parse_requirements.bzl#L114-L126)`](https://github.com/bazel-contrib/rules_python/blob/032f6aa738a673b13b605dabf55465c6fc1a56eb/python/private/pypi/parse_requirements.bzl#L114-L126) uses a sort-by-length heuristic to select the “best” requirement when there are multiple entries with the same base name. This works well in legacy `requirements.txt` files where: ``` my_dep my_dep[foo] my_dep[foo,bar] ``` ...would indicate an intent to select the **most specific subset of extras** (i.e. the longest name). However, this heuristic **breaks** in the presence of **platform markers**, where extras are **not subsets**, but distinct variants. In the example above, Bazel mistakenly selects `optimum[onnxruntime-gpu]` on macOS because it's a longer match, even though it is guarded by a Linux-only marker. #### ✅ Fix: This PR modifies the behavior to: 1. **Add the requirement marker** as part of the sorting key. 2. **Then apply the longest-match logic** to drop duplicate requirements with different extras but the same markers. This ensures that only applicable requirements are considered during resolution, preserving correctness in multi-platform environments. #### 🧪 Before: On macOS, the following entry is incorrectly selected: ``` optimum[onnxruntime-gpu]==1.17.1 ; sys_platform == 'linux' ``` #### ✅ After: Correct entry is selected: ``` optimum[onnxruntime]==1.17.1 ; sys_platform == 'darwin' ``` close #2690 --------- Co-authored-by: Ignas Anikevicius <[email protected]>

aignas added the type: pip pip/pypi integration label Mar 21, 2025

aignas mentioned this issue Mar 23, 2025

refactor(pypi): implement PEP508 compliant marker evaluation #2692

Merged

ewianda mentioned this issue Apr 10, 2025

fix: Resolve incorrect platform specific dependency #2766

Merged

aignas closed this as completed in #2766 Apr 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect platform-specific dependency selected from universal requirements.txt using pip.parse #2690

Incorrect platform-specific dependency selected from universal requirements.txt using pip.parse #2690

ewianda commented Mar 21, 2025 •

edited

Loading

aignas commented Mar 22, 2025

aignas commented Apr 1, 2025

ewianda commented Apr 7, 2025 •

edited

Loading

ewianda commented Apr 9, 2025

aignas commented Apr 9, 2025

ewianda commented Apr 9, 2025

ewianda commented Apr 10, 2025

Incorrect platform-specific dependency selected from universal requirements.txt using pip.parse #2690

Incorrect platform-specific dependency selected from universal requirements.txt using pip.parse #2690

Comments

ewianda commented Mar 21, 2025 • edited Loading

🐞 bug report

Affected Rule

Is this a regression?

Description

requirements.in:

requirements.txt (generated):

MODULE.bazel:

🔥 Exception or Error

🔬 Minimal Reproduction

🌍 Your Environment

aignas commented Mar 22, 2025

aignas commented Apr 1, 2025

ewianda commented Apr 7, 2025 • edited Loading

ewianda commented Apr 9, 2025

aignas commented Apr 9, 2025

ewianda commented Apr 9, 2025

ewianda commented Apr 10, 2025

ewianda commented Mar 21, 2025 •

edited

Loading

`requirements.in`:

`requirements.txt` (generated):

`MODULE.bazel`:

ewianda commented Apr 7, 2025 •

edited

Loading