Commit bd5425c
test(security): add Brev E2E tests for command injection and credential sanitization (#1092)
* test(security): add E2E tests for command injection and credential sanitization
Adds two new Brev E2E test suites targeting the vulnerabilities fixed by
PR #119 (Telegram bridge command injection) and PR #156 (credential
exposure in migration snapshots + blueprint digest bypass).
Test suites:
- test-telegram-injection.sh: 8 tests covering command substitution,
backtick injection, quote-breakout, parameter expansion, process
table leaks, and SANDBOX_NAME validation
- test-credential-sanitization.sh: 13 tests covering auth-profiles.json
deletion, credential field stripping, non-credential preservation,
symlink safety, blueprint digest verification, and pattern-based
field detection
These tests are expected to FAIL on main (unfixed code) and PASS
once PR #119 and #156 are merged.
Refs: #118, #119, #156, #813
* ci: temporarily disable repo guard for fork testing
* ci: bump bootstrap timeout, skip vLLM on CPU E2E runs
- Add SKIP_VLLM=1 support to brev-setup.sh
- Use SKIP_VLLM=1 in brev-e2e.test.js bootstrap
- Bump beforeAll timeout to 30 min for CPU instances
- Bump workflow timeout to 60 min for 3 test suites
* ci: bump bootstrap timeout to 40 min for sandbox image build
* ci: bump Brev instance to 8x32 for faster Docker builds
* ci: add real-time progress streaming for E2E bootstrap and tests
- Stream SSH output to CI log during bootstrap (no more silence)
- Add timestamps to brev-setup.sh and setup.sh info/warn/fail messages
- Add background progress reporter during sandbox Docker build (heartbeat every 30s showing elapsed time, current Docker step, and last log line)
- Stream test script output to CI log via tee + capture for assertions
- Filter potential secrets from progress heartbeat output
* ci: use NemoClaw launchable for E2E bootstrap
Replace bare 'brev create' + brev-setup.sh with 'brev start' using the
OpenShell-Community launch-nemoclaw.sh setup script. This installs Docker,
OpenShell CLI, and Node.js via the launchable's proven path, then runs
'nemoclaw onboard --non-interactive' to build the sandbox (testing whether
this path is faster than our manual setup.sh).
Changes:
- Default CPU back to 4x16 (8x32 didn't help — bottleneck was I/O)
- Launchable path: brev start + setup-script URL, poll for completion,
rsync PR branch, npm ci, nemoclaw onboard
- Legacy path preserved (USE_LAUNCHABLE=0)
- Timestamped logging throughout for timing comparison
- New use_launchable workflow input (default: true)
* fix: prevent openshell sandbox create from hanging in non-interactive mode
openshell sandbox create without a command defaults to opening an interactive
shell inside the sandbox. In CI (non-interactive SSH), this hangs forever —
the sandbox goes Ready but the command never returns. The [?2004h] terminal
escape codes in CI logs were bash enabling bracketed paste mode, waiting for
input.
Add --no-tty -- true so the command exits immediately after the sandbox is
created and Ready.
* fix: source nvm in non-interactive SSH for launchable path
The launchable setup script installs Node.js via nvm, which sets up PATH
in ~/.nvm/nvm.sh. Non-interactive SSH doesn't source .bashrc, so npm/node
commands fail with 'command not found'. Source nvm.sh before running npm
in the launchable path and runRemoteTest.
* fix: setup.sh respects NEMOCLAW_SANDBOX_NAME env var
setup.sh defaulted to 'nemoclaw' ignoring the NEMOCLAW_SANDBOX_NAME env
var set by the CI test harness (e2e-test). Now uses $1 > $NEMOCLAW_SANDBOX_NAME > nemoclaw.
* ci: bump full E2E test timeout to 15 min for install + sandbox build
* ci: don't run full E2E alongside security tests (it destroys the sandbox)
The full E2E test runs install.sh --non-interactive which destroys and
rebuilds the sandbox. When TEST_SUITE=all, this kills the sandbox that
beforeAll created, causing credential-sanitization and telegram-injection
to fail with 'sandbox not running'. Only run full E2E when TEST_SUITE=full.
* ci: pre-build base image locally when GHCR image unavailable
On forks or before the first base-image workflow run, the GHCR base image
(ghcr.io/nvidia/nemoclaw/sandbox-base:latest) doesn't exist. This causes
the Dockerfile's FROM to fail. Now setup.sh checks for the base image
and builds Dockerfile.base locally if needed.
On subsequent builds, Docker layer cache makes this near-instant.
Once the GHCR base image is available, this becomes a no-op (docker pull
succeeds and the local build is skipped).
* ci: install nemoclaw CLI after bootstrap in non-launchable path
brev-setup.sh creates the sandbox but doesn't install the host-side
nemoclaw CLI that test scripts need for 'nemoclaw <name> status'.
Add npm install + build + link step after bootstrap.
* fix: use npm_config_prefix for nemoclaw CLI install so it lands on PATH
* fix: npm link from repo root where bin.nemoclaw is defined
* fix(ci): register sandbox in nemoclaw registry after setup.sh bootstrap
setup.sh creates the sandbox via openshell directly but never writes
~/.nemoclaw/sandboxes.json. The security test scripts check
`nemoclaw <name> status` which reads the registry, causing all E2E
runs to fail with 'Sandbox e2e-test not running'.
Write the registry entry after nemoclaw CLI install so the test
scripts can find the sandbox.
* style: shfmt formatting fix in setup.sh
* fix(test): exclude policy presets from C7 secret pattern scan
C7 greps for 'npm_' inside the sandbox and false-positives on
nemoclaw-blueprint/policies/presets/npm.yaml which contains rule
names like 'npm_yarn', not actual credentials. Filter out /policies/
paths from all three pattern checks.
* docs(ci): add test suite descriptions to e2e-brev workflow header
Document what each test_suite option runs so maintainers can make an
informed choice from the Actions UI without reading the test scripts.
* ci: re-enable repo guard for e2e-brev workflow
Re-enable the github.repository check so the workflow only runs on
NVIDIA/NemoClaw, not on forks.
* fix(test): update setup-sandbox-name test for NEMOCLAW_SANDBOX_NAME env var
setup.sh now uses ${1:-${NEMOCLAW_SANDBOX_NAME:-nemoclaw}} instead of
${1:-nemoclaw}. Update the test to match and add coverage for the env
var fallback path.
* fix(lint): add shellcheck directives for injection test payloads and fix stdio type
* fix(lint): suppress SC2034 for status_output in credential sanitization test
* fix: address CodeRabbit review — timeout, pipefail, fail-closed probes, shell injection in test
- Bump e2e-brev workflow timeout-minutes from 60 to 90
- Add fail-fast when launchable setup exceeds 40-min wait
- Add pipefail to remote pipeline commands in runRemoteTest and npm ci
- Fix backtick shell injection in validateName test loop (use process.argv)
- Make sandbox_exec fail closed with __PROBE_FAILED__ sentinel
- Add probe failure checks in C6/C7 sandbox assertions
---------
Co-authored-by: Carlos Villela <[email protected]>1 parent 07589f8 commit bd5425c
File tree
4 files changed
+1540
-29
lines changed- .github/workflows
- test/e2e
4 files changed
+1540
-29
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
6 | 28 | | |
7 | 29 | | |
8 | 30 | | |
| |||
15 | 37 | | |
16 | 38 | | |
17 | 39 | | |
18 | | - | |
| 40 | + | |
19 | 41 | | |
20 | 42 | | |
21 | 43 | | |
22 | 44 | | |
23 | 45 | | |
24 | 46 | | |
| 47 | + | |
25 | 48 | | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
26 | 54 | | |
27 | 55 | | |
28 | 56 | | |
| |||
41 | 69 | | |
42 | 70 | | |
43 | 71 | | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
44 | 76 | | |
45 | 77 | | |
46 | 78 | | |
| |||
64 | 96 | | |
65 | 97 | | |
66 | 98 | | |
67 | | - | |
| 99 | + | |
68 | 100 | | |
69 | 101 | | |
70 | 102 | | |
| |||
110 | 142 | | |
111 | 143 | | |
112 | 144 | | |
| 145 | + | |
113 | 146 | | |
114 | 147 | | |
115 | 148 | | |
| |||
0 commit comments