Skip to content

fix(execd): avoid global signal to fix false command failures#1042

Merged
Pangjiping merged 1 commit into
opensandbox-group:mainfrom
LavenderQAQ:fix/wait-child
Jun 16, 2026
Merged

fix(execd): avoid global signal to fix false command failures#1042
Pangjiping merged 1 commit into
opensandbox-group:mainfrom
LavenderQAQ:fix/wait-child

Conversation

@LavenderQAQ

@LavenderQAQ LavenderQAQ commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #1041.

execd could report a successfully-executed command as a failure (CommandExecError) when cmd.Wait() returned a spurious ECHILD ("waitid: no child processes").
The root cause was global signal handling in runCommand / runBackgroundCommand: they called signal.Notify(signals) with no signal list (capturing ALL signals, including SIGCHLD and SIGURG) and defer signal.Reset() (a process-global reset). This interfered with the Go runtime's use of SIGCHLD/SIGURG (child reaping and async preemption) and raced across concurrent/sequential commands, occasionally leaving Wait() unable to reap its own child.

This change:

  1. Replaces signal.Notify(signals) + signal.Reset() with signal.Notify(signals, forwardSignals...) + signal.Stop(signals), so only an explicit set of signals is forwarded and cleanup is scoped to this channel instead of resetting global handlers.
  2. Ignores spurious ECHILD from cmd.Wait() (child already reaped) so a command that ran to completion is reported as success instead of a false failure.

Testing

  • Unit tests
  • e2e / manual verification

Breaking Changes

  • None

Checklist

  • Linked Issue or clearly described motivation
  • Added/updated docs (if needed)
  • Added/updated tests (if needed)
  • Security impact considered
  • Backward compatibility considered

@LavenderQAQ LavenderQAQ marked this pull request as draft June 12, 2026 14:31

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 970cfde872

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread components/execd/pkg/runtime/command.go Outdated
@LavenderQAQ

Copy link
Copy Markdown
Contributor Author

As mentioned at the end of #1041, I need some time to observe whether this patch really resolves the issue. Once this problem no longer occurs, I will convert this PR to "ready".

Signed-off-by: LavenderQAQ <lavenderqaq.cs@gmail.com>
@LavenderQAQ LavenderQAQ changed the title fix(execd): avoid global signal reset and ignore spurious ECHILD fix(execd): avoid global signal to fix false command failures Jun 12, 2026

@Pangjiping Pangjiping left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall — root cause analysis is solid and the fix is correct. Scoping signal.Notify to an explicit list + using signal.Stop instead of the global signal.Reset cleanly eliminates the SIGCHLD/SIGURG interference and the cross-command race.

Ready to merge once you mark it as ready for review 👍

@Pangjiping Pangjiping self-assigned this Jun 15, 2026
@Pangjiping Pangjiping added bug Something isn't working component/execd labels Jun 15, 2026
@LavenderQAQ LavenderQAQ marked this pull request as ready for review June 16, 2026 13:11
@LavenderQAQ

LavenderQAQ commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

@Pangjiping Thank you for your review. After this fix, I no longer observed the error "waitid: no child processes", so I marked this PR as ready.

@Pangjiping

Copy link
Copy Markdown
Collaborator

@Pangjiping Thank you for your review. After this fix, I no longer observed the error "waitid: no child processes", so I marked this PR as ready.

Thanks for this. Great Jobs 🎉

@Pangjiping Pangjiping left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Pangjiping Pangjiping merged commit 5e838ed into opensandbox-group:main Jun 16, 2026
12 checks passed
@LavenderQAQ LavenderQAQ deleted the fix/wait-child branch June 16, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working component/execd

Projects

None yet

Development

Successfully merging this pull request may close these issues.

execd: global signal capture/reset causes spurious ECHILD and false command failures

2 participants