Skip to content

[NemoClaw] nemoclaw-start.sh has no signal handlers — SIGTERM orphans child processes and skips graceful shutdown #1015

@latenighthackathon

Description

@latenighthackathon

Description

scripts/nemoclaw-start.sh is the container entrypoint (PID 1) but has zero trap statements. When the container receives SIGTERM during shutdown, wait "$GATEWAY_PID" on line 298 is interrupted and the script exits immediately without:

  • Forwarding SIGTERM to the gateway process
  • Cleaning up the auto-pair watcher (whose PID is never captured — only echo'd on line 205)
  • Flushing logs or performing any graceful teardown

What happens: On docker stop or sandbox destroy, PID 1 receives SIGTERM, the wait builtin is interrupted, and the script exits. The gateway process may receive SIGKILL after the Docker grace period instead of a clean SIGTERM. The auto-pair Python watcher (backgrounded on line 146 with &) is orphaned entirely — its PID is printed to stdout but never stored in a variable.

What should happen: The entrypoint should register a trap handler that forwards signals to child processes and waits for them to exit cleanly.

Reproduction Steps

  1. Start a NemoClaw sandbox
  2. Run docker stop <container> or nemoclaw <name> destroy
  3. Observe that the gateway process does not receive SIGTERM — it receives SIGKILL after Docker's 10-second grace period
  4. Check for orphaned auto-pair processes

Code evidence — no trap in the entire file:

$ grep -n 'trap ' scripts/nemoclaw-start.sh
# (no output)

Auto-pair PID never captured (line 146 & 205):

# Line 146: launched in background
OPENCLAW_BIN="$OPENCLAW" nohup "${run_prefix[@]}" python3 - <<'PYAUTOPAIR' >>/tmp/auto-pair.log 2>&1 &
# ...
# Line 205: PID printed but NOT stored
echo "[gateway] auto-pair watcher launched (pid $!)"

Wait without signal forwarding (line 296–298):

# This script is PID 1 (ENTRYPOINT); if it exits, Docker kills all children.
wait "$GATEWAY_PID"

Environment

  • Code review of main branch (commit HEAD as of 2026-03-26)
  • Affected file: scripts/nemoclaw-start.sh — entire file (lines 1–299)
  • Both root and non-root paths affected (root: line 298, non-root: line 245)

Logs

Expected pattern for a PID 1 entrypoint:

AUTO_PAIR_PID=$!

cleanup() {
  kill -TERM "$GATEWAY_PID" "$AUTO_PAIR_PID" 2>/dev/null
  wait "$GATEWAY_PID" "$AUTO_PAIR_PID" 2>/dev/null
}
trap cleanup SIGTERM SIGINT

wait "$GATEWAY_PID"

Checklist

  • I confirmed this bug is reproducible
  • I searched existing issues and this is not a duplicate

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions