Skip to content

test: improve CLI test determinism and remove redundant test logic#1123

Open
ksapru wants to merge 5 commits intoNVIDIA:mainfrom
ksapru:fix/dead-code-cleanup-v2
Open

test: improve CLI test determinism and remove redundant test logic#1123
ksapru wants to merge 5 commits intoNVIDIA:mainfrom
ksapru:fix/dead-code-cleanup-v2

Conversation

@ksapru
Copy link
Copy Markdown

@ksapru ksapru commented Mar 30, 2026

Summary

Improves test determinism, consistency, and reliability across CLI, uninstall, and blueprint test suites by standardizing shell invocation, tightening execution patterns, and removing redundant or outdated test code.


Related Issue

Fixes #977 (part 1)


Changes

  • Normalize shell invocation:
    • Replace bash -lc with bash -c in uninstall tests to avoid shell initialization side effects
  • Improve CLI test stability:
    • Increase timeouts for long-running commands
    • Standardize usage of runWithEnv(..., timeout)
  • Remove redundant / outdated test code:
    • Clean up unused or deprecated test logic in runner.test.ts
  • Improve test consistency:
    • Align execution patterns across CLI and uninstall tests
  • Preserve security coverage:
    • Maintain regression protections (e.g., path validation and credential handling)

Verification

  • npm test passes locally
  • npx prek run --all-files passes in CI
  • No changes to CLI behavior or runtime logic
  • Existing security and regression tests continue to pass

Rationale

Some tests relied on shell initialization behavior (bash -lc) and inconsistent execution patterns, leading to flakiness and non-deterministic outcomes.

These updates:

  • eliminate shell-dependent variability
  • standardize execution across test suites
  • improve reliability without impacting functionality

Additionally, minor cleanup removes redundant or outdated test code to improve maintainability.


Risk Assessment

Low risk

  • Changes are limited to test code and execution behavior
  • No production code paths modified
  • Security and regression coverage preserved

Rollback

  • Fully reversible by reverting test changes

Type of Change

  • Test / infrastructure improvement (no behavioral change)
  • Code cleanup / maintenance

Testing

  • npm test passes
  • npx prek run --all-files passes (CI)

Checklist

General

  • Contributing guide followed

Code Changes

  • Formatters applied
  • No user-facing behavior changes
  • No secrets committed

Summary by CodeRabbit (updated)

  • Tests
    • Improved CLI and uninstall test determinism by standardizing shell invocation
    • Increased timeouts to reduce flakiness in long-running test cases
    • Removed redundant or outdated test logic for improved maintainability

Summary by CodeRabbit

  • Tests
    • Enhanced TypeScript type safety in test mocks across the blueprint module.
    • Refactored test setup utilities and assertions for improved clarity and maintainability.
    • Streamlined test environment configuration and execution patterns.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e21727d4-0123-4ffd-aabf-036180f35e30

📥 Commits

Reviewing files that changed from the base of the PR and between e965901 and 8fa35a6.

📒 Files selected for processing (4)
  • nemoclaw/src/blueprint/runner.test.ts
  • nemoclaw/src/blueprint/snapshot.test.ts
  • nemoclaw/src/blueprint/state.test.ts
  • test/uninstall.test.js
✅ Files skipped from review due to trivial changes (2)
  • nemoclaw/src/blueprint/state.test.ts
  • nemoclaw/src/blueprint/runner.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • nemoclaw/src/blueprint/snapshot.test.ts

📝 Walkthrough

Walkthrough

Type annotations are added to TypeScript filesystem mocks across three test files to explicitly cast importOriginal() results. A shared snapshot path constant is introduced, and test assertions are refactored in one file. The uninstall test is refactored with npm stub simplification and bash command flag changes.

Changes

Cohort / File(s) Summary
Blueprint test type annotations
nemoclaw/src/blueprint/runner.test.ts, nemoclaw/src/blueprint/state.test.ts
Explicitly cast importOriginal() to typeof import("node:fs") in vi.mock factory functions for improved TypeScript typing of mocked filesystem methods.
Snapshot test refactoring
nemoclaw/src/blueprint/snapshot.test.ts
Introduce shared SNAP constant ("/snap/20260323") and use it in snapshot path assertions; refactor createSnapshot test to use .not.toBeNull() and non-null assertion (result!) instead of inline throw-on-null check.
Uninstall test cleanup
test/uninstall.test.js
Refactor npm stub creation to single-line writeFileSync call; change bash invocation from bash -lc to bash -c in three spawnSync calls; add explicit HOME: tmp environment variable override in test setup and cleanup; remove inline descriptive comments; minor formatting adjustments.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Our tests now type more crisply,
With snapshots grouped so niftily,
bash flags trimmed, env set right,
Dead code's tests shine ever bright!
hops cheerfully 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR does not address issue #977's core objective to decide on dead code remediation (wire in, remove, or keep with documentation). Clarify how this PR relates to #977—does it prepare tests for module removal, wire them into the CLI, or intend to keep them documented? Currently, only test changes appear without answering the issue's remediation question.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive While test modernization (bash -c vs bash -lc, timeout improvements) aligns with the PR summary, the connection to dead code cleanup (#977) remains unclear. Provide explicit clarification: are test changes meant to stabilize tests before module removal, or is this a preparatory step for a different decision on the dead modules?
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: test improvements focused on CLI determinism and removal of redundant test logic across test files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
nemoclaw/src/blueprint/runner.test.ts (1)

577-641: ⚠️ Potential issue | 🟠 Major

Please restore regression coverage for apply --plan rejection.

main no longer tests the unsupported --plan path, but runtime still rejects it in actionApply. This leaves CLI parse/dispatch behavior unguarded.

Proposed test addition
   describe("main (CLI)", () => {
@@
     it("parses apply with --profile and --endpoint-url", async () => {
       await main(["apply", "--profile", "default", "--endpoint-url", "https://override.test/v1"]);
       expect(mockedValidateEndpoint).toHaveBeenCalledWith("https://override.test/v1");
       expect(stdoutText()).toContain("PROGRESS:100:Apply complete");
     });
+
+    it("rejects apply when --plan is provided (not yet implemented)", async () => {
+      await expect(
+        main(["apply", "--profile", "default", "--plan", "/tmp/plan.json"]),
+      ).rejects.toThrow(/--plan is not yet implemented/);
+    });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/blueprint/runner.test.ts` around lines 577 - 641, Add a test in
the existing "main (CLI)" suite that exercises the unsupported apply --plan
path: call main with arguments like ["apply","--plan","some-plan.json"] (after
the existing beforeEach setup) and assert it rejects with an error containing
"--plan" (or the exact rejection text emitted by actionApply); this restores
regression coverage for the main -> actionApply dispatch path and ensures CLI
parsing still rejects the --plan option at runtime.
test/cli.test.js (1)

16-27: ⚠️ Potential issue | 🔴 Critical

Critical bug: spawnSync is misconfigured — tests will fail with TypeError: r.out.includes is not a function.

The current implementation has multiple issues:

  1. spawnSync does not throw exceptions — unlike execSync, it always returns a result object with an error property. The try-catch block will never catch non-zero exits; the returned out is always the full result object {error, status, stdout, stderr, ...}, which has no .includes() method.

  2. Tests call .includes() on an object — every test assertion like r.out.includes("Getting Started") will fail at runtime with TypeError: r.out.includes is not a function.

  3. Missing shell: true — without it, spawnSync treats the string as a literal executable name (looking for a file named node "${CLI}" ${args}), resulting in ENOENT instead of executing the shell command.

To fix, use:

Corrected implementation
 function runWithEnv(args, env = {}, timeout = 10000) {
-  try {
-    const out = spawnSync(`node "${CLI}" ${args}`, {
-      encoding: "utf-8",
-      timeout,
-      env: { ...process.env, HOME: "/tmp/nemoclaw-cli-test-" + Date.now(), ...env },
-    });
-    return { code: 0, out };
-  } catch (err) {
-    return { code: err.status, out: (err.stdout || "") + (err.stderr || "") };
-  }
+  const result = spawnSync(`node "${CLI}" ${args}`, {
+    shell: true,
+    encoding: "utf-8",
+    timeout,
+    env: { ...process.env, HOME: "/tmp/nemoclaw-cli-test-" + Date.now(), ...env },
+  });
+  const out = (result.stdout || "") + (result.stderr || "");
+  return { code: result.status ?? 1, out };
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/cli.test.js` around lines 16 - 27, The runWithEnv function misuses
spawnSync: it never throws, returns a result object (so tests calling
r.out.includes fail), and the command string needs shell: true; fix runWithEnv
by calling spawnSync with shell: true (or pass command and args as an array),
then read the returned result.stdout/stderr (convert to string) and
result.status/result.error to determine exit code; return { code: <numeric
status or error.status>, out: <stdout + stderr as string> } so callers can
safely call r.out.includes; update references in runWithEnv to use the result
object fields instead of assuming spawnSync throws.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@nemoclaw/src/blueprint/runner.test.ts`:
- Around line 577-641: Add a test in the existing "main (CLI)" suite that
exercises the unsupported apply --plan path: call main with arguments like
["apply","--plan","some-plan.json"] (after the existing beforeEach setup) and
assert it rejects with an error containing "--plan" (or the exact rejection text
emitted by actionApply); this restores regression coverage for the main ->
actionApply dispatch path and ensures CLI parsing still rejects the --plan
option at runtime.

In `@test/cli.test.js`:
- Around line 16-27: The runWithEnv function misuses spawnSync: it never throws,
returns a result object (so tests calling r.out.includes fail), and the command
string needs shell: true; fix runWithEnv by calling spawnSync with shell: true
(or pass command and args as an array), then read the returned
result.stdout/stderr (convert to string) and result.status/result.error to
determine exit code; return { code: <numeric status or error.status>, out:
<stdout + stderr as string> } so callers can safely call r.out.includes; update
references in runWithEnv to use the result object fields instead of assuming
spawnSync throws.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 10c32d50-fb8f-495a-8091-6c998082c50e

📥 Commits

Reviewing files that changed from the base of the PR and between 0a97e89 and e965901.

📒 Files selected for processing (5)
  • nemoclaw/src/blueprint/runner.test.ts
  • nemoclaw/src/blueprint/snapshot.test.ts
  • nemoclaw/src/blueprint/state.test.ts
  • test/cli.test.js
  • test/uninstall.test.js

@ksapru ksapru force-pushed the fix/dead-code-cleanup-v2 branch from e965901 to 93ab0bb Compare March 30, 2026 21:46
Copy link
Copy Markdown
Contributor

@cv cv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — the determinism goal here makes sense, but I think this needs a bit more work before merge.

Two blockers from the current diff:

  1. test/cli.test.js: the execSync -> spawnSync swap is not equivalent as written. spawnSync(node "${CLI}" ${args}, ...) will try to execute a binary with that full string as the executable name unless shell: true is set, so this should hit ENOENT. Also, spawnSync returns { status, stdout, stderr, error } and does not throw on non-zero exit, so the helper now returns { code: 0, out: resultObj } on success instead of a string, and the existing error path no longer matches execSync semantics. I think this is the likely cause of the failing test-unit job. If we want spawnSync here, I would switch to spawnSync("node", [CLI, ...args], ...) and rebuild the helper around status/stdout/stderr/error.

  2. nemoclaw/src/blueprint/runner.test.ts: I don’t think the --plan test is redundant yet. runner.ts on current main still explicitly throws --plan is not yet implemented... in actionApply(), so removing this test drops coverage for behavior that still exists in production code.

Optional follow-up: the bash -lc -> bash -c direction in test/uninstall.test.js seems reasonable, but the file is not Prettier-clean right now, which may explain the red lint job. Also, for the HOME cases, setting HOME via env is safer than embedding HOME="..." source ... inside the command string.

Happy to re-review once those are addressed.

@ksapru ksapru force-pushed the fix/dead-code-cleanup-v2 branch from 07e6b03 to 8324977 Compare March 30, 2026 21:54
@ksapru
Copy link
Copy Markdown
Author

ksapru commented Mar 31, 2026

I’ve reverted the execSync → spawnSync change in test/cli.test.js. The previous swap wasn’t equivalent (as you pointed out: ENOENT risk + different return/error semantics), and keeping execSync preserves the current behavior and test expectations.

Also re-added the --plan test to retain coverage for the existing behavior in runner.ts.

For test/uninstall.test.js, I switched to bash -c and moved HOME into env for more deterministic behavior. I’ll make sure the file is Prettier-clean as well.

Happy to revisit a proper spawnSync refactor separately if that’s something we want to pursue.

@ksapru ksapru requested a review from cv March 31, 2026 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dead plugin modules: 4 TypeScript files ship in dist/ but are unreachable at runtime

2 participants