Skip to content

subprocess.run stalls indefinitely and consumes all memory when checking ninja version #955

Open
@gareth-cross

Description

@gareth-cross
  • Python version: 3.10 (installed via conda-forge)
  • scikit-build-core version: 0.10.7 (but issue is present on 0.10.6 as well)
  • OS: Ubuntu 22.04 (WSL, kernel: 5.15.167.4)

Steps to reproduce:

I am running pip wheel --verbose --verbose --verbose . on my project. The build gets this far:

 Created temporary directory: /tmp/pip-build-env-x87al_n8
  Created temporary directory: /tmp/pip-standalone-pip-rpekp9xc
  Running command /home/gareth/repos/wfenv/bin/python /tmp/pip-standalone-pip-rpekp9xc/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-x87al_n8/overlay --no-warn-script-location -v --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'scikit-build-core @ file:///home/gareth/repos/scikit-build-core' 'cmake>=3.20,<3.31' 'ninja>=1.5'
...
  Collecting cmake<3.31,>=3.20
    Using cached cmake-3.30.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.9 MB)
  Collecting ninja>=1.5
    Using cached ninja-1.11.1.2-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (422 kB)
...
  Getting requirements to build wheel ... done
...
Building wheels for collected packages: wrenfold
  Created temporary directory: /tmp/pip-wheel-sqd1lqbx
  Destination directory: /tmp/pip-wheel-sqd1lqbx
  Running command /home/gareth/repos/wfenv/bin/python /home/gareth/repos/wfenv/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpafsk0kce
  2024-12-03 21:46:10,392 - scikit_build_core - WARNING - cmake should not be in build-system.requires - scikit-build-core will inject it as needed
  2024-12-03 21:46:10,392 - scikit_build_core - WARNING - ninja should not be in build-system.requires - scikit-build-core will inject it as needed
  2024-12-03 21:46:10,413 - scikit_build_core - INFO - RUN: /tmp/pip-build-env-x87al_n8/overlay/lib/python3.10/site-packages/cmake/data/bin/cmake -E capabilities
  2024-12-03 21:46:10,419 - scikit_build_core - INFO - CMake version: 3.30.5
  *** scikit-build-core 0.10.7 using CMake 3.30.5 (wheel)
  2024-12-03 21:46:10,423 - scikit_build_core - INFO - Build directory: /tmp/tmpb32t54no/build
  *** Configuring CMake...
  2024-12-03 21:46:10,427 - scikit_build_core - DEBUG - SITE_PACKAGES: /home/gareth/repos/wfenv/lib/python3.10/site-packages
  2024-12-03 21:46:10,427 - scikit_build_core - DEBUG - Extra SITE_PACKAGES: /tmp/pip-build-env-x87al_n8/overlay/lib/python3.10/site-packages
  2024-12-03 21:46:10,427 - scikit_build_core - DEBUG - PATH: ['/home/gareth/repos/wfenv/lib/python3.10/site-packages/pip/_vendor/pep517/in_process', '/tmp/pip-build-env-x87al_n8/site', '/home/gareth/mambaforge/envs/devtools/lib/python310.zip', '/home/gareth/mambaforge/envs/devtools/lib/python3.10', '/home/gareth/mambaforge/envs/devtools/lib/python3.10/lib-dynload', '/tmp/pip-build-env-x87al_n8/overlay/lib/python3.10/site-packages', '/tmp/pip-build-env-x87al_n8/normal/lib/python3.10/site-packages']
  2024-12-03 21:46:10,432 - scikit_build_core - DEBUG - Default generator: Ninja
  2024-12-03 21:46:10,433 - scikit_build_core - INFO - RUN: /home/gareth/repos/wfenv/bin/ninja --version

The process then stalls, and memory usage grows indefinitely until the process dies. If I kill the process, it appears to stop while reading stdout inside subprocess. I realize this context is a little thin at the moment, but I am still trying to gather debugging information.

Running the command /home/gareth/repos/wfenv/bin/ninja --version manually has no issues. It prints 1.11.1.git.kitware.jobserver-1 and exits.

One (possibly tangential) question I have is: Why does scikit-build-core query the instance of ninja present in my virtual environment /home/gareth/repos/wfenv/bin/ninja (see INFO print above), rather than the version that is collected by pip wheel in the build overlay. Is this expected?

Notably, if I uninstall the instance of ninja in wfenv, the build proceeds normally:

  2024-12-03 22:12:27,909 - scikit_build_core - DEBUG - Default generator: Ninja
  2024-12-03 22:12:27,910 - scikit_build_core - INFO - RUN: ninja --version
  2024-12-03 22:12:27,911 - scikit_build_core - INFO - Ninja version: 1.11.1
  2024-12-03 22:12:27,911 - scikit_build_core - DEBUG - CMAKE_GENERATOR: Using ninja: ninja

I instrumented my CMake to check the path to ninja and found:

  -- CMAKE_MAKE_PROGRAM: ninja
  -- Path to make program: /tmp/pip-build-env-aducxdi1/overlay/bin/ninja

Which appears to be correct - it is using the overlay version.

Of course, I can remove any stray instances of ninja in my virtual environment - but it is somewhat concerning that finding the wrong one triggers a lock-up followed by OOM, so I would like to understand this issue a bit better.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions