Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n2 leaks fds on macOS #14

Open
nico opened this issue Mar 28, 2022 · 7 comments · Fixed by #25
Open

n2 leaks fds on macOS #14

nico opened this issue Mar 28, 2022 · 7 comments · Fixed by #25
Labels
help wanted Extra attention is needed

Comments

@nico
Copy link
Contributor

nico commented Mar 28, 2022

Command::spawn() in the rust stdlib unconditionally calls anon_pipe here: https://github.com/rust-lang/rust/blob/master/library/std/src/sys/unix/process/process_unix.rs#L62

anon_pipe on Linux calls pipe2 to set CLOEXEC on the pipe atomically: https://github.com/rust-lang/rust/blob/521734787ecf80ff12df7ca5998f7ec0b3b7b2c9/library/std/src/sys/unix/pipe.rs#L18

But macOS has no pipe2, so here the stdlib instead calls pipe() followed by set_cloexec: https://github.com/rust-lang/rust/blob/521734787ecf80ff12df7ca5998f7ec0b3b7b2c9/library/std/src/sys/unix/pipe.rs#L35

This means there's a window where the pipe is created but cloexec isn't set on the pipe's FDs yet. If a different thread forks in that window, the pipe's fds get leaked.

(Not that it matters, but set_cloexec is here https://github.com/rust-lang/rust/blob/master/library/std/src/sys/unix/fd.rs#L204 . On macOS and a few other OSs, it calls ioctl with FIOCLEX, which is just 1 syscall. On Linux and a few others, it calls fcntl with F_GETFD/F_SETFD which is 2 syscalls, but we don't need to call set_cloexec at all in this path on Linux.)

This is likely why n2 -j250 runs out of FDs on macOS, while ninja -j250 works fine.

Possible fixes:

  • Don't fork on background threads. Create subprocess on main thread, and then hand the created subprocess to background thread for waiting on it and to do deps processing

  • Don't use threads per subprocesses at all, use a select loop model instead (but still use threads for deps processing)

  • Don't use the Command object on non-Windows either (we already don't on Windows), and add mutexes around fork and pipe/set_cloexec

Spawning subprocesses takes some time (we made it async in chromium at some point for that reason we looked at making it async in ninja for that reason), so moving it to the main thread would add some amount of work on to the critical path of n2.

@evmar
Copy link
Owner

evmar commented Mar 30, 2022

Having thought about this a bit, I am curious about the "spawning subprocesses takes some time" part. If it's a skip-a-frame kind of latency, like up to 100ms occasionally, then it's probably fine to serialize it in the main process, though I guess that would interact poorly with -j1000. If it's a multi second kind of thing then definitely it needs a different approach.

I'd probably just go with a mutex around calling Command::spawn (instead of ::output) for now, with an eye towards rewriting to ppoll eventually (or tokio maybe, but I'm kinda skeptical of async Rust).

@evmar evmar added the help wanted Extra attention is needed label Mar 30, 2022
@piscisaureus
Copy link
Contributor

Command::spawn() in the rust stdlib unconditionally calls anon_pipe here

I think it is unlikely that FDs from this pipe are leaked.

Both ends are short lived and explicitly closed: the write end is closed immediately after fork() returns, the read end is closed one as soon as the child has called exec().

So the only situation it could cause a problem is when the process literally has 100s of forked-but-not-yet-exec'ed children. Theoretically possible, but it sounds like a stretch to me.

@bnoordhuis
Copy link

Just throwing it out there: libstd uses posix_spawn() under the right conditions (instead of fork + execve) and macOS has a POSIX_SPAWN_CLOEXEC_DEFAULT attribute that does what its name suggests. Teaching libstd about it probably isn't too hard.

we made it async in chromium at some point for that reason

@nico I'm interested, can you say more? My understanding is that as long as you're using fork, there's no getting away from paying the cost of copying the working set.

@nico
Copy link
Contributor Author

nico commented Mar 31, 2022

I think it is unlikely that FDs from this pipe are leaked.

It's possible this theory is wrong. Background is that n2 needs way more FDs than ninja (ninja -j250 works fine on my mac, n2 -j250 dies with "Too many open files (os error 24"), and I compared fs_usage logs.

Python script to generate build.ninja file that repros the problem
% cat fdtest.py
import textwrap
def write(f):
  f.write(textwrap.dedent('''\
    rule b
      command = sleep 300; touch $out
    '''))
  for i in range(1000):
    f.write(f'build foo{i}: b\n')
  # n2 needs an explicit default target:
  f.write('default')
  for i in range(1000):
    f.write(f' foo{i}')
  f.write('\n')
with open('build.ninja', 'w', encoding='utf-8') as f:
  write(f)
% python3 fdtest.py

I ran something like sudo fs_usage -w -f filesys| grep n2 > fs-n2.txt and then ran n2 -j50 for the build file generated by that python script. Then I did the same with ninja, and compared the two txt files.

This was the main difference that stood out, and there is a race there. So it seems plausible to me, but it's well possible this is not the root cause. (It does fit with things seemingly working fine on Linux -- but Linux also has a higher FD limit.)


@nico I'm interested, can you say more?

Turns out I was confusing things! We didn't do this in chromium, we were talking about doing this in ninja, here: https://bugs.chromium.org/p/chromium/issues/detail?id=829710#c3 (but as far as I know didn't end up merging it)

IIRC the background there was that someone was running ninja on a Windows system with some thing installed that made process creation very slow. But if you're in that situation, your build is going to be slow anyways and it's imho not necessarily something tools should go out of their way to mitigate.

So it's probably fine to just start processes on the main thread.

nico added a commit to nico/n2 that referenced this issue Apr 2, 2022
Fixes an fd leak on all unix systems that don't have pipe2.
As of today, that's all unix systems that are not dragonfly bsd,
freebsd, linux, netbsd, openbsd, or redox -- in particular, macOS.

Fixes evmar#14.
@nico
Copy link
Contributor Author

nico commented Apr 2, 2022

#25 fixes this locally for me, so it does look like that race creation race is indeed the cause of the fd leak.

WIth that patch applied, and the build.ninja file generated by the script in #14 (comment), I can run n2 -j250 and it works just as well as ninja.

I can also build llvm with n2 -j250 using goma, just like with ninja.

nico added a commit to nico/n2 that referenced this issue Apr 2, 2022
Fixes an fd leak on all unix systems that don't have pipe2.
As of today, that's all unix systems that are not dragonfly bsd,
freebsd, linux, netbsd, openbsd, or redox -- in particular, macOS.

Fixes evmar#14.
nico added a commit to nico/n2 that referenced this issue Apr 2, 2022
Fixes an fd leak on all unix systems that don't have pipe2.
As of today, that's all unix systems that are not dragonfly bsd,
freebsd, linux, netbsd, openbsd, or redox -- in particular, macOS.

Fixes evmar#14.
nico added a commit to nico/n2 that referenced this issue Apr 2, 2022
Fixes an fd leak on all unix systems that don't have pipe2.
As of today, that's all unix systems that are not dragonfly bsd,
freebsd, linux, netbsd, openbsd, or redox -- in particular, macOS.

Fixes evmar#14.
nico added a commit to nico/n2 that referenced this issue Apr 2, 2022
Fixes an fd leak on all unix systems that don't have pipe2.
As of today, that's all unix systems that are not dragonfly bsd,
freebsd, linux, netbsd, openbsd, or redox -- in particular, macOS.

Fixes evmar#14.
nico added a commit to nico/n2 that referenced this issue Apr 2, 2022
Fixes an fd leak on all unix systems that don't have pipe2.
As of today, that's all unix systems that are not dragonfly bsd,
freebsd, linux, netbsd, openbsd, or redox -- in particular, macOS.

Fixes evmar#14.
nico added a commit to nico/n2 that referenced this issue Apr 2, 2022
Fixes an fd leak on all unix systems that don't have pipe2.
As of today, that's all unix systems that are not dragonfly bsd,
freebsd, linux, netbsd, openbsd, or redox -- in particular, macOS.

Fixes evmar#14.
@nico
Copy link
Contributor Author

nico commented Apr 2, 2022

(I also filed an upstream issue.)

@evmar evmar closed this as completed in #25 Apr 2, 2022
evmar pushed a commit that referenced this issue Apr 2, 2022
Fixes an fd leak on all unix systems that don't have pipe2.
As of today, that's all unix systems that are not dragonfly bsd,
freebsd, linux, netbsd, openbsd, or redox -- in particular, macOS.

Fixes #14.
evmar added a commit that referenced this issue Jul 7, 2023
This avoids issue #14 and gives us lower-level control over the setup of
the pipe.
@nico
Copy link
Contributor Author

nico commented Sep 15, 2023

I think 9c2104a regressed this. The test in #14 (comment) repros this again.

(I don't have permissions to reopen this issue here.)

@evmar evmar reopened this Sep 15, 2023
evmar added a commit that referenced this issue Jan 7, 2025
From #14, this exercises fd exhaustion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants