Summary
execd can report a successfully-executed command as a failure (CommandExecError) when cmd.Wait() returns ECHILD ("waitid: no child processes"). The command's stdout is produced correctly, but because Wait() cannot retrieve the child's exit status, the result is surfaced as an error with a hardcoded exit code of 1.
The root trigger is in runCommand / runBackgroundCommand: they call signal.Notify(signals) with no signal list (capturing ALL signals, including SIGCHLD and SIGURG) and defer signal.Reset() (a process-global reset). This interferes with the Go runtime's own use of SIGCHLD/SIGURG (child reaping coordination and async preemption) and races across concurrent/sequential commands, occasionally leaving Wait() unable to reap its own child (ECHILD).
Version
Reproduction
Run multiple commands back-to-back via the execd command run API, especially ones that fork a background child. cmd.Wait() intermittently returns ECHILD even though the command already finished and produced correct output.
Expected
A command that runs to completion and produces correct output is reported as success (exit status 0). execd should not surface an internal reaping race as a command failure.
Impact
- Any caller running execd in a sandbox can intermittently get false
CommandExecError for commands that actually succeeded.
- More likely under concurrent/sequential command execution and background-fork workloads.
- Additionally, capturing ALL signals via
signal.Notify(signals) steals SIGURG from the Go runtime (used for async goroutine preemption) and uses a process-global signal.Reset(), which has process-wide side effects across concurrent commands.
Note: I am currently testing a patch. If I can no longer observe such issues after some time, I will submit a pull request.
Summary
execdcan report a successfully-executed command as a failure (CommandExecError) whencmd.Wait()returnsECHILD("waitid: no child processes"). The command's stdout is produced correctly, but becauseWait()cannot retrieve the child's exit status, the result is surfaced as an error with a hardcoded exit code of 1.The root trigger is in
runCommand/runBackgroundCommand: they callsignal.Notify(signals)with no signal list (capturing ALL signals, including SIGCHLD and SIGURG) anddefer signal.Reset()(a process-global reset). This interferes with the Go runtime's own use of SIGCHLD/SIGURG (child reaping coordination and async preemption) and races across concurrent/sequential commands, occasionally leavingWait()unable to reap its own child (ECHILD).Version
Reproduction
Run multiple commands back-to-back via the execd command run API, especially ones that fork a background child.
cmd.Wait()intermittently returnsECHILDeven though the command already finished and produced correct output.Expected
A command that runs to completion and produces correct output is reported as success (exit status 0).
execdshould not surface an internal reaping race as a command failure.Impact
CommandExecErrorfor commands that actually succeeded.signal.Notify(signals)steals SIGURG from the Go runtime (used for async goroutine preemption) and uses a process-globalsignal.Reset(), which has process-wide side effects across concurrent commands.Note: I am currently testing a patch. If I can no longer observe such issues after some time, I will submit a pull request.