-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subprocess.run
stalls indefinitely and consumes all memory when checking ninja version
#955
Comments
From navigating the code I guess you are hitting
And then it fails further down the line when it tries to match ninja version specification. When you run $ ninja --version
$ echo $? Do you get a non-zero exit value, because it would put it in that branch |
It exits normally with return code 0. If I kill the stalled process with ctrl-C, it seems like it never escapes out of the call to I am not really familiar with the expected behavior here - it feels like scikit-build-core invoking the existing ninja install in my venv is incorrect, and rather it should use the version installed in the build overlay. |
It should try the local one first. If it’s installed in the build env, you should not be able to get past it (unless it was broken). Though the outer one should be broken either. Forcing a pip version is not recommended, as some platforms do not have wheels, like BSDs. Will have to investigate, hopefully later today. |
I encountered exactly the same problem. What saved me in the end was to uninstall Ninja with |
Anything unusual about your setup that I could reproduce? I've tried to reproduce this, but haven't been able to. I've tried something like this: docker run --rm -it ubuntu:24.10
apt update && apt install python3-venv git
python3 -m venv .venv
. .venv/bin/activate
pip install cmake ninja
git clone https://github.com/wrenfold/wrenfold --recurse-submodules
cd wrenfold/
pip wheel --verbose --verbose --verbose . But it seems fine. |
From my reading of the original post I think it's more on the |
I tried conda, same thing, still no lock up: apt update && apt install curl git python3-pip
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
eval "$(./bin/micromamba shell hook -s posix)"
micromamba install cmake ninja
micromamba activate base
git clone https://github.com/wrenfold/wrenfold --recurse-submodules
cd wrenfold
pip wheel --verbose --verbose --verbose . |
I took another shot at replicating this again, under both Ubuntu 22.04 and Ubuntu 24.04 (and python 3.10 and 3.12). I cannot seem to replicate it again either, unfortunately. The only advice I can give to anybody who experiences the same issue is: uninstall ninja from the virtual env, and use the version installed by scikit-build-core. |
Protection for #955, though I can't reproduce. Signed-off-by: Henry Schreiner <[email protected]>
3.10
(installed via conda-forge)0.10.7
(but issue is present on 0.10.6 as well)5.15.167.4
)Steps to reproduce:
I am running
pip wheel --verbose --verbose --verbose .
on my project. The build gets this far:The process then stalls, and memory usage grows indefinitely until the process dies. If I kill the process, it appears to stop while reading
stdout
insidesubprocess
. I realize this context is a little thin at the moment, but I am still trying to gather debugging information.Running the command
/home/gareth/repos/wfenv/bin/ninja --version
manually has no issues. It prints1.11.1.git.kitware.jobserver-1
and exits.One (possibly tangential) question I have is: Why does scikit-build-core query the instance of ninja present in my virtual environment
/home/gareth/repos/wfenv/bin/ninja
(see INFO print above), rather than the version that is collected bypip wheel
in the build overlay. Is this expected?Notably, if I uninstall the instance of ninja in
wfenv
, the build proceeds normally:I instrumented my CMake to check the path to ninja and found:
Which appears to be correct - it is using the overlay version.
Of course, I can remove any stray instances of ninja in my virtual environment - but it is somewhat concerning that finding the wrong one triggers a lock-up followed by OOM, so I would like to understand this issue a bit better.
The text was updated successfully, but these errors were encountered: