fix(execd): avoid global signal to fix false command failures#1042
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 970cfde872
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
As mentioned at the end of #1041, I need some time to observe whether this patch really resolves the issue. Once this problem no longer occurs, I will convert this PR to "ready". |
Signed-off-by: LavenderQAQ <lavenderqaq.cs@gmail.com>
970cfde to
297f2dd
Compare
There was a problem hiding this comment.
LGTM overall — root cause analysis is solid and the fix is correct. Scoping signal.Notify to an explicit list + using signal.Stop instead of the global signal.Reset cleanly eliminates the SIGCHLD/SIGURG interference and the cross-command race.
Ready to merge once you mark it as ready for review 👍
|
@Pangjiping Thank you for your review. After this fix, I no longer observed the error "waitid: no child processes", so I marked this PR as ready. |
Thanks for this. Great Jobs 🎉 |
Summary
Fixes #1041.
execdcould report a successfully-executed command as a failure (CommandExecError) whencmd.Wait()returned a spuriousECHILD("waitid: no child processes").The root cause was global signal handling in
runCommand/runBackgroundCommand: they calledsignal.Notify(signals)with no signal list (capturing ALL signals, including SIGCHLD and SIGURG) anddefer signal.Reset()(a process-global reset). This interfered with the Go runtime's use of SIGCHLD/SIGURG (child reaping and async preemption) and raced across concurrent/sequential commands, occasionally leavingWait()unable to reap its own child.This change:
signal.Notify(signals)+signal.Reset()withsignal.Notify(signals, forwardSignals...)+signal.Stop(signals), so only an explicit set of signals is forwarded and cleanup is scoped to this channel instead of resetting global handlers.ECHILDfromcmd.Wait()(child already reaped) so a command that ran to completion is reported as success instead of a false failure.Testing
Breaking Changes
Checklist