feat(benchmark): add multipass VM integration test suite#830
feat(benchmark): add multipass VM integration test suite#830pszymkowiak wants to merge 4 commits intodevelopfrom
Conversation
Bun/TypeScript orchestrator that creates an Ubuntu 24.04 VM via multipass, installs all dev tools (Rust, Go, Node, Python, .NET, Terraform, etc.), builds RTK, and runs 103 tests across 11 phases: - Cargo quality (fmt, clippy, test) - 47 Rust built-in commands (git, ls, grep, cargo, pytest, go, tsc...) - 21 TOML filter commands (df, ps, shellcheck, hadolint, helm...) - Hook rewrite engine (17 rewrite assertions) - Exit code preservation - Token savings verification (avg 81%) - Pipe compatibility - Edge cases (unicode, ANSI, empty output) - Performance (memory < 20MB) - Concurrency (10 parallel executions) Usage: bun run scripts/benchmark/run.ts # Full suite (~3 min) bun run scripts/benchmark/run.ts --quick # Skip perf/concurrency bun run scripts/benchmark/run.ts --phase 3 # Single phase bun run scripts/benchmark/cleanup.ts # Delete VM bun run scripts/benchmark/rebuild.ts # Fast rebuild Prerequisites: brew install multipass, bun Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>
…lure - Detect binary files (null bytes in first 8KB) before filtering - Show clear message: [binary file] path (size) instead of empty output - Fallback to raw content if filter produces empty output on non-empty file - Prevents LLM from concluding a 70MB file is "empty" (was #822) Fixes #822 Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>
|
Good contribution — real Linux integration tests fill a gap that macOS unit tests can't cover. The VM reuse pattern is practical, cloud-init provisioning is thorough, and the phase-based architecture (vm/test/report separation) is clean. The A few things worth addressing before merge: Binary size limit changed without docs update The PR bumps the limit to 8MB but CLAUDE.md specifies
For Phase 2, the Known RTK bug documented instead of fixed // NOTE: rtk err swallows exit code (known bug) — test output only, not exit code
await testCmd("runner:err", `${RTK} err false`, "any");This should be a separate tracked issue, not a permanent workaround in the test suite.
const phaseOnly = args.includes("--phase")
? parseInt(args[args.indexOf("--phase") + 1], 10)
: null;If
Two failing tests shouldn't be an acceptable release state — it creates a budget for flaky tests to accumulate. Either fix the flaky tests or skip them explicitly with a reason. No CI integration is understandable (multipass in GHA is non-trivial), but worth a follow-up issue to wire this into the pre-release workflow. |
- Strict exit codes: cargo/python/go tests expect exact exit codes instead of "any" (catches real regressions) - Binary size limit documented: 8MB for ARM Linux VM vs 5MB x86 stripped - --phase NaN guard: error message instead of silent no-op - Verdict: 0 failures = READY, any failure = NOT READY (no more "minor issues" budget) - rtk err exit code bug tracked in #846 Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>
|
Thanks for the thorough review @FlorianBruniaux — all 5 points addressed in e91a4dc:
Current run: 101/103 (98%) — the 2 failures are |
Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>
|
Hey We are cleaning up the codebase and improving the project structure for better onboarding. As part of this effort, PR #826 reorganizes No logic changes — only file moves and import path updates. What you need to doRebase your branch on git fetch origin && git rebase origin/developGit detects renames automatically. If you get import conflicts, update the paths: use crate::git; // now: use crate::cmds::git::git;
use crate::tracking; // now: use crate::core::tracking;
use crate::config; // now: use crate::core::config;
use crate::init; // now: use crate::hooks::init;
use crate::gain; // now: use crate::analytics::gain;Need help rebasing? Tag @aeppling |
Summary
Test phases
Usage
Prerequisites
brew install multipass # bun already required by projectTest plan