test: improve CLI test determinism and remove redundant test logic by ksapru · Pull Request #1123 · NVIDIA/NemoClaw

ksapru · 2026-03-30T21:35:10Z

Summary

Improves test determinism, consistency, and reliability across CLI, uninstall, and blueprint test suites by standardizing shell invocation, tightening execution patterns, and removing redundant or outdated test code.

Related Issue

Fixes #977 (part 1)

Changes

Normalize shell invocation:
- Replace bash -lc with bash -c in uninstall tests to avoid shell initialization side effects
Improve CLI test stability:
- Increase timeouts for long-running commands
- Standardize usage of runWithEnv(..., timeout)
Remove redundant / outdated test code:
- Clean up unused or deprecated test logic in runner.test.ts
Improve test consistency:
- Align execution patterns across CLI and uninstall tests
Preserve security coverage:
- Maintain regression protections (e.g., path validation and credential handling)

Verification

npm test passes locally
npx prek run --all-files passes in CI
No changes to CLI behavior or runtime logic
Existing security and regression tests continue to pass

Rationale

Some tests relied on shell initialization behavior (bash -lc) and inconsistent execution patterns, leading to flakiness and non-deterministic outcomes.

These updates:

eliminate shell-dependent variability
standardize execution across test suites
improve reliability without impacting functionality

Additionally, minor cleanup removes redundant or outdated test code to improve maintainability.

Risk Assessment

Low risk

Changes are limited to test code and execution behavior
No production code paths modified
Security and regression coverage preserved

Rollback

Fully reversible by reverting test changes

Type of Change

Test / infrastructure improvement (no behavioral change)
Code cleanup / maintenance

Testing

npm test passes
npx prek run --all-files passes (CI)

Checklist

General

Contributing guide followed

Code Changes

Formatters applied
No user-facing behavior changes
No secrets committed

Summary by CodeRabbit (updated)

Tests
- Improved CLI and uninstall test determinism by standardizing shell invocation
- Increased timeouts to reduce flakiness in long-running test cases
- Removed redundant or outdated test logic for improved maintainability

Summary by CodeRabbit

Tests
- Enhanced TypeScript type safety in test mocks across the blueprint module.
- Refactored test setup utilities and assertions for improved clarity and maintainability.
- Streamlined test environment configuration and execution patterns.

coderabbitai · 2026-03-30T21:35:31Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e21727d4-0123-4ffd-aabf-036180f35e30

📥 Commits

Reviewing files that changed from the base of the PR and between e965901 and 8fa35a6.

📒 Files selected for processing (4)

nemoclaw/src/blueprint/runner.test.ts
nemoclaw/src/blueprint/snapshot.test.ts
nemoclaw/src/blueprint/state.test.ts
test/uninstall.test.js

✅ Files skipped from review due to trivial changes (2)

nemoclaw/src/blueprint/state.test.ts
nemoclaw/src/blueprint/runner.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

nemoclaw/src/blueprint/snapshot.test.ts

📝 Walkthrough

Walkthrough

Type annotations are added to TypeScript filesystem mocks across three test files to explicitly cast importOriginal() results. A shared snapshot path constant is introduced, and test assertions are refactored in one file. The uninstall test is refactored with npm stub simplification and bash command flag changes.

Changes

Cohort / File(s)	Summary
Blueprint test type annotations `nemoclaw/src/blueprint/runner.test.ts`, `nemoclaw/src/blueprint/state.test.ts`	Explicitly cast `importOriginal()` to `typeof import("node:fs")` in `vi.mock` factory functions for improved TypeScript typing of mocked filesystem methods.
Snapshot test refactoring `nemoclaw/src/blueprint/snapshot.test.ts`	Introduce shared `SNAP` constant (`"/snap/20260323"`) and use it in snapshot path assertions; refactor `createSnapshot` test to use `.not.toBeNull()` and non-null assertion (`result!`) instead of inline throw-on-null check.
Uninstall test cleanup `test/uninstall.test.js`	Refactor npm stub creation to single-line `writeFileSync` call; change bash invocation from `bash -lc` to `bash -c` in three `spawnSync` calls; add explicit `HOME: tmp` environment variable override in test setup and cleanup; remove inline descriptive comments; minor formatting adjustments.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Our tests now type more crisply,
With snapshots grouped so niftily,
bash flags trimmed, env set right,
Dead code's tests shine ever bright!
hops cheerfully 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR does not address issue `#977`'s core objective to decide on dead code remediation (wire in, remove, or keep with documentation).	Clarify how this PR relates to `#977`—does it prepare tests for module removal, wire them into the CLI, or intend to keep them documented? Currently, only test changes appear without answering the issue's remediation question.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check	❓ Inconclusive	While test modernization (bash -c vs bash -lc, timeout improvements) aligns with the PR summary, the connection to dead code cleanup (`#977`) remains unclear.	Provide explicit clarification: are test changes meant to stabilize tests before module removal, or is this a preparatory step for a different decision on the dead modules?

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: test improvements focused on CLI determinism and removal of redundant test logic across test files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

nemoclaw/src/blueprint/runner.test.ts (1)

577-641: ⚠️ Potential issue | 🟠 Major

Please restore regression coverage for apply --plan rejection.

main no longer tests the unsupported --plan path, but runtime still rejects it in actionApply. This leaves CLI parse/dispatch behavior unguarded.

Proposed test addition

   describe("main (CLI)", () => {
@@
     it("parses apply with --profile and --endpoint-url", async () => {
       await main(["apply", "--profile", "default", "--endpoint-url", "https://override.test/v1"]);
       expect(mockedValidateEndpoint).toHaveBeenCalledWith("https://override.test/v1");
       expect(stdoutText()).toContain("PROGRESS:100:Apply complete");
     });
+
+    it("rejects apply when --plan is provided (not yet implemented)", async () => {
+      await expect(
+        main(["apply", "--profile", "default", "--plan", "/tmp/plan.json"]),
+      ).rejects.toThrow(/--plan is not yet implemented/);
+    });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/blueprint/runner.test.ts` around lines 577 - 641, Add a test in
the existing "main (CLI)" suite that exercises the unsupported apply --plan
path: call main with arguments like ["apply","--plan","some-plan.json"] (after
the existing beforeEach setup) and assert it rejects with an error containing
"--plan" (or the exact rejection text emitted by actionApply); this restores
regression coverage for the main -> actionApply dispatch path and ensures CLI
parsing still rejects the --plan option at runtime.

test/cli.test.js (1)

16-27: ⚠️ Potential issue | 🔴 Critical

Critical bug: spawnSync is misconfigured — tests will fail with TypeError: r.out.includes is not a function.

The current implementation has multiple issues:

spawnSync does not throw exceptions — unlike execSync, it always returns a result object with an error property. The try-catch block will never catch non-zero exits; the returned out is always the full result object {error, status, stdout, stderr, ...}, which has no .includes() method.

Tests call .includes() on an object — every test assertion like r.out.includes("Getting Started") will fail at runtime with TypeError: r.out.includes is not a function.

Missing shell: true — without it, spawnSync treats the string as a literal executable name (looking for a file named node "${CLI}" ${args}), resulting in ENOENT instead of executing the shell command.

To fix, use:
Corrected implementation
 function runWithEnv(args, env = {}, timeout = 10000) {
-  try {
-    const out = spawnSync(`node "${CLI}" ${args}`, {
-      encoding: "utf-8",
-      timeout,
-      env: { ...process.env, HOME: "/tmp/nemoclaw-cli-test-" + Date.now(), ...env },
-    });
-    return { code: 0, out };
-  } catch (err) {
-    return { code: err.status, out: (err.stdout || "") + (err.stderr || "") };
-  }
+  const result = spawnSync(`node "${CLI}" ${args}`, {
+    shell: true,
+    encoding: "utf-8",
+    timeout,
+    env: { ...process.env, HOME: "/tmp/nemoclaw-cli-test-" + Date.now(), ...env },
+  });
+  const out = (result.stdout || "") + (result.stderr || "");
+  return { code: result.status ?? 1, out };
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/cli.test.js` around lines 16 - 27, The runWithEnv function misuses
spawnSync: it never throws, returns a result object (so tests calling
r.out.includes fail), and the command string needs shell: true; fix runWithEnv
by calling spawnSync with shell: true (or pass command and args as an array),
then read the returned result.stdout/stderr (convert to string) and
result.status/result.error to determine exit code; return { code: <numeric
status or error.status>, out: <stdout + stderr as string> } so callers can
safely call r.out.includes; update references in runWithEnv to use the result
object fields instead of assuming spawnSync throws.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@nemoclaw/src/blueprint/runner.test.ts`:
- Around line 577-641: Add a test in the existing "main (CLI)" suite that
exercises the unsupported apply --plan path: call main with arguments like
["apply","--plan","some-plan.json"] (after the existing beforeEach setup) and
assert it rejects with an error containing "--plan" (or the exact rejection text
emitted by actionApply); this restores regression coverage for the main ->
actionApply dispatch path and ensures CLI parsing still rejects the --plan
option at runtime.

In `@test/cli.test.js`:
- Around line 16-27: The runWithEnv function misuses spawnSync: it never throws,
returns a result object (so tests calling r.out.includes fail), and the command
string needs shell: true; fix runWithEnv by calling spawnSync with shell: true
(or pass command and args as an array), then read the returned
result.stdout/stderr (convert to string) and result.status/result.error to
determine exit code; return { code: <numeric status or error.status>, out:
<stdout + stderr as string> } so callers can safely call r.out.includes; update
references in runWithEnv to use the result object fields instead of assuming
spawnSync throws.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 10c32d50-fb8f-495a-8091-6c998082c50e

📥 Commits

Reviewing files that changed from the base of the PR and between 0a97e89 and e965901.

📒 Files selected for processing (5)

nemoclaw/src/blueprint/runner.test.ts
nemoclaw/src/blueprint/snapshot.test.ts
nemoclaw/src/blueprint/state.test.ts
test/cli.test.js
test/uninstall.test.js

cv

Thanks — the determinism goal here makes sense, but I think this needs a bit more work before merge.

Two blockers from the current diff:

test/cli.test.js: the execSync -> spawnSync swap is not equivalent as written. spawnSync(node "${CLI}" ${args}, ...) will try to execute a binary with that full string as the executable name unless shell: true is set, so this should hit ENOENT. Also, spawnSync returns { status, stdout, stderr, error } and does not throw on non-zero exit, so the helper now returns { code: 0, out: resultObj } on success instead of a string, and the existing error path no longer matches execSync semantics. I think this is the likely cause of the failing test-unit job. If we want spawnSync here, I would switch to spawnSync("node", [CLI, ...args], ...) and rebuild the helper around status/stdout/stderr/error.
nemoclaw/src/blueprint/runner.test.ts: I don’t think the --plan test is redundant yet. runner.ts on current main still explicitly throws --plan is not yet implemented... in actionApply(), so removing this test drops coverage for behavior that still exists in production code.

Optional follow-up: the bash -lc -> bash -c direction in test/uninstall.test.js seems reasonable, but the file is not Prettier-clean right now, which may explain the red lint job. Also, for the HOME cases, setting HOME via env is safer than embedding HOME="..." source ... inside the command string.

Happy to re-review once those are addressed.

ksapru · 2026-03-31T13:07:17Z

I’ve reverted the execSync → spawnSync change in test/cli.test.js. The previous swap wasn’t equivalent (as you pointed out: ENOENT risk + different return/error semantics), and keeping execSync preserves the current behavior and test expectations.

Also re-added the --plan test to retain coverage for the existing behavior in runner.ts.

For test/uninstall.test.js, I switched to bash -c and moved HOME into env for more deterministic behavior. I’ll make sure the file is Prettier-clean as well.

Happy to revisit a proper spawnSync refactor separately if that’s something we want to pursue.

test: improve CLI test determinism and remove redundant test logic

8bf5781

ksapru mentioned this pull request Mar 30, 2026

fix(cli): remove dead --plan logic and unused snapshot modules, optimize test timeouts #1030

Closed

7 tasks

test: improve CLI test determinism and remove redundant test logic

93ab0bb

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

ksapru force-pushed the fix/dead-code-cleanup-v2 branch from e965901 to 93ab0bb Compare March 30, 2026 21:46

cv requested changes Mar 30, 2026

View reviewed changes

test: improve CLI test determinism and remove redundant test logic

8324977

ksapru force-pushed the fix/dead-code-cleanup-v2 branch from 07e6b03 to 8324977 Compare March 30, 2026 21:54

ksapru added 2 commits March 30, 2026 17:54

Merge branch 'main' into fix/dead-code-cleanup-v2

8fa35a6

Merge branch 'main' into fix/dead-code-cleanup-v2

5806eee

ksapru requested a review from cv March 31, 2026 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: improve CLI test determinism and remove redundant test logic#1123

test: improve CLI test determinism and remove redundant test logic#1123
ksapru wants to merge 5 commits intoNVIDIA:mainfrom
ksapru:fix/dead-code-cleanup-v2

ksapru commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

cv left a comment

Uh oh!

ksapru commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ksapru commented Mar 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Verification

Rationale

Risk Assessment

Type of Change

Testing

Checklist

General

Code Changes

Summary by CodeRabbit (updated)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cv left a comment

Choose a reason for hiding this comment

Uh oh!

ksapru commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ksapru commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading