Skip to content

Commit bd5425c

Browse files
jyaunchescv
andauthored
test(security): add Brev E2E tests for command injection and credential sanitization (#1092)
* test(security): add E2E tests for command injection and credential sanitization Adds two new Brev E2E test suites targeting the vulnerabilities fixed by PR #119 (Telegram bridge command injection) and PR #156 (credential exposure in migration snapshots + blueprint digest bypass). Test suites: - test-telegram-injection.sh: 8 tests covering command substitution, backtick injection, quote-breakout, parameter expansion, process table leaks, and SANDBOX_NAME validation - test-credential-sanitization.sh: 13 tests covering auth-profiles.json deletion, credential field stripping, non-credential preservation, symlink safety, blueprint digest verification, and pattern-based field detection These tests are expected to FAIL on main (unfixed code) and PASS once PR #119 and #156 are merged. Refs: #118, #119, #156, #813 * ci: temporarily disable repo guard for fork testing * ci: bump bootstrap timeout, skip vLLM on CPU E2E runs - Add SKIP_VLLM=1 support to brev-setup.sh - Use SKIP_VLLM=1 in brev-e2e.test.js bootstrap - Bump beforeAll timeout to 30 min for CPU instances - Bump workflow timeout to 60 min for 3 test suites * ci: bump bootstrap timeout to 40 min for sandbox image build * ci: bump Brev instance to 8x32 for faster Docker builds * ci: add real-time progress streaming for E2E bootstrap and tests - Stream SSH output to CI log during bootstrap (no more silence) - Add timestamps to brev-setup.sh and setup.sh info/warn/fail messages - Add background progress reporter during sandbox Docker build (heartbeat every 30s showing elapsed time, current Docker step, and last log line) - Stream test script output to CI log via tee + capture for assertions - Filter potential secrets from progress heartbeat output * ci: use NemoClaw launchable for E2E bootstrap Replace bare 'brev create' + brev-setup.sh with 'brev start' using the OpenShell-Community launch-nemoclaw.sh setup script. This installs Docker, OpenShell CLI, and Node.js via the launchable's proven path, then runs 'nemoclaw onboard --non-interactive' to build the sandbox (testing whether this path is faster than our manual setup.sh). Changes: - Default CPU back to 4x16 (8x32 didn't help — bottleneck was I/O) - Launchable path: brev start + setup-script URL, poll for completion, rsync PR branch, npm ci, nemoclaw onboard - Legacy path preserved (USE_LAUNCHABLE=0) - Timestamped logging throughout for timing comparison - New use_launchable workflow input (default: true) * fix: prevent openshell sandbox create from hanging in non-interactive mode openshell sandbox create without a command defaults to opening an interactive shell inside the sandbox. In CI (non-interactive SSH), this hangs forever — the sandbox goes Ready but the command never returns. The [?2004h] terminal escape codes in CI logs were bash enabling bracketed paste mode, waiting for input. Add --no-tty -- true so the command exits immediately after the sandbox is created and Ready. * fix: source nvm in non-interactive SSH for launchable path The launchable setup script installs Node.js via nvm, which sets up PATH in ~/.nvm/nvm.sh. Non-interactive SSH doesn't source .bashrc, so npm/node commands fail with 'command not found'. Source nvm.sh before running npm in the launchable path and runRemoteTest. * fix: setup.sh respects NEMOCLAW_SANDBOX_NAME env var setup.sh defaulted to 'nemoclaw' ignoring the NEMOCLAW_SANDBOX_NAME env var set by the CI test harness (e2e-test). Now uses $1 > $NEMOCLAW_SANDBOX_NAME > nemoclaw. * ci: bump full E2E test timeout to 15 min for install + sandbox build * ci: don't run full E2E alongside security tests (it destroys the sandbox) The full E2E test runs install.sh --non-interactive which destroys and rebuilds the sandbox. When TEST_SUITE=all, this kills the sandbox that beforeAll created, causing credential-sanitization and telegram-injection to fail with 'sandbox not running'. Only run full E2E when TEST_SUITE=full. * ci: pre-build base image locally when GHCR image unavailable On forks or before the first base-image workflow run, the GHCR base image (ghcr.io/nvidia/nemoclaw/sandbox-base:latest) doesn't exist. This causes the Dockerfile's FROM to fail. Now setup.sh checks for the base image and builds Dockerfile.base locally if needed. On subsequent builds, Docker layer cache makes this near-instant. Once the GHCR base image is available, this becomes a no-op (docker pull succeeds and the local build is skipped). * ci: install nemoclaw CLI after bootstrap in non-launchable path brev-setup.sh creates the sandbox but doesn't install the host-side nemoclaw CLI that test scripts need for 'nemoclaw <name> status'. Add npm install + build + link step after bootstrap. * fix: use npm_config_prefix for nemoclaw CLI install so it lands on PATH * fix: npm link from repo root where bin.nemoclaw is defined * fix(ci): register sandbox in nemoclaw registry after setup.sh bootstrap setup.sh creates the sandbox via openshell directly but never writes ~/.nemoclaw/sandboxes.json. The security test scripts check `nemoclaw <name> status` which reads the registry, causing all E2E runs to fail with 'Sandbox e2e-test not running'. Write the registry entry after nemoclaw CLI install so the test scripts can find the sandbox. * style: shfmt formatting fix in setup.sh * fix(test): exclude policy presets from C7 secret pattern scan C7 greps for 'npm_' inside the sandbox and false-positives on nemoclaw-blueprint/policies/presets/npm.yaml which contains rule names like 'npm_yarn', not actual credentials. Filter out /policies/ paths from all three pattern checks. * docs(ci): add test suite descriptions to e2e-brev workflow header Document what each test_suite option runs so maintainers can make an informed choice from the Actions UI without reading the test scripts. * ci: re-enable repo guard for e2e-brev workflow Re-enable the github.repository check so the workflow only runs on NVIDIA/NemoClaw, not on forks. * fix(test): update setup-sandbox-name test for NEMOCLAW_SANDBOX_NAME env var setup.sh now uses ${1:-${NEMOCLAW_SANDBOX_NAME:-nemoclaw}} instead of ${1:-nemoclaw}. Update the test to match and add coverage for the env var fallback path. * fix(lint): add shellcheck directives for injection test payloads and fix stdio type * fix(lint): suppress SC2034 for status_output in credential sanitization test * fix: address CodeRabbit review — timeout, pipefail, fail-closed probes, shell injection in test - Bump e2e-brev workflow timeout-minutes from 60 to 90 - Add fail-fast when launchable setup exceeds 40-min wait - Add pipefail to remote pipeline commands in runRemoteTest and npm ci - Fix backtick shell injection in validateName test loop (use process.argv) - Make sandbox_exec fail closed with __PROBE_FAILED__ sentinel - Add probe failure checks in C6/C7 sandbox assertions --------- Co-authored-by: Carlos Villela <[email protected]>
1 parent 07589f8 commit bd5425c

File tree

4 files changed

+1540
-29
lines changed

4 files changed

+1540
-29
lines changed

.github/workflows/e2e-brev.yaml

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,28 @@
33

44
name: e2e-brev
55

6+
# Ephemeral Brev E2E: provisions a cloud instance, bootstraps NemoClaw,
7+
# runs test suites remotely, then tears down. Use workflow_dispatch to
8+
# trigger manually from the Actions tab, or workflow_call from other workflows.
9+
#
10+
# Test suites:
11+
# full — Install → onboard → sandbox verify → live inference
12+
# against NVIDIA Endpoints → CLI operations. Tests the
13+
# complete user journey. (~10 min, destroys sandbox)
14+
# credential-sanitization — 24 tests validating PR #743: credential stripping from
15+
# migration snapshots, auth-profiles.json deletion, blueprint
16+
# digest verification, symlink traversal protection, and
17+
# runtime sandbox credential checks. Requires running sandbox.
18+
# telegram-injection — 18 tests validating PR #584: command injection prevention
19+
# through $(cmd), backticks, quote breakout, ${VAR} expansion,
20+
# process table leak checks, and SANDBOX_NAME validation.
21+
# Requires running sandbox.
22+
# all — Runs credential-sanitization + telegram-injection (NOT full,
23+
# which destroys the sandbox the security tests need).
24+
#
25+
# Required secrets: BREV_API_TOKEN, NVIDIA_API_KEY
26+
# Instance cost: Brev CPU credits (~$0.10/run for 4x16 instance)
27+
628
on:
729
workflow_dispatch:
830
inputs:
@@ -15,14 +37,20 @@ on:
1537
required: false
1638
default: ""
1739
test_suite:
18-
description: "Test suite to run"
40+
description: "Test suite to run (see workflow header for descriptions)"
1941
required: true
2042
default: "full"
2143
type: choice
2244
options:
2345
- full
2446
- credential-sanitization
47+
- telegram-injection
2548
- all
49+
use_launchable:
50+
description: "Use NemoClaw launchable (true) or bare brev-setup.sh (false)"
51+
required: false
52+
type: boolean
53+
default: true
2654
keep_alive:
2755
description: "Keep Brev instance alive after tests (for SSH debugging)"
2856
required: false
@@ -41,6 +69,10 @@ on:
4169
required: false
4270
type: string
4371
default: "full"
72+
use_launchable:
73+
required: false
74+
type: boolean
75+
default: true
4476
keep_alive:
4577
required: false
4678
type: boolean
@@ -64,7 +96,7 @@ jobs:
6496
e2e-brev:
6597
if: github.repository == 'NVIDIA/NemoClaw'
6698
runs-on: ubuntu-latest
67-
timeout-minutes: 45
99+
timeout-minutes: 90
68100
steps:
69101
- name: Checkout target branch
70102
uses: actions/checkout@v6
@@ -110,6 +142,7 @@ jobs:
110142
GITHUB_TOKEN: ${{ github.token }}
111143
INSTANCE_NAME: e2e-pr-${{ inputs.pr_number || github.run_id }}
112144
TEST_SUITE: ${{ inputs.test_suite }}
145+
USE_LAUNCHABLE: ${{ inputs.use_launchable && '1' || '0' }}
113146
KEEP_ALIVE: ${{ inputs.keep_alive }}
114147
run: npx vitest run --project e2e-brev --reporter=verbose
115148

0 commit comments

Comments
 (0)