Measures hyperpanes against other Windows terminals on the same machine, so the "did the native rewrite deliver the memory win?" question (and future regressions) rest on numbers rather than estimates. It is detect-only: it never installs, updates, or otherwise mutates your system — it benchmarks whatever is already built/installed.
The primary target is the native Rust hyperpanes (rs/crates/app). The pre-rewrite
Electron build is an optional baseline (built from branch archive/electron).
Suites: memory (idle WS/Private + idle CPU — the headline) for every app, plus throughput and startup for terminals that expose a run-a-command CLI. A manual input-latency procedure (Typometer) is documented below.
The native hyperpanes is measured idle-only. Its GUI binary (v0.0.1) ignores CLI argv and has no run-a-command flag, so the harness cannot inject an in-pane workload — it launches a fresh instance (one default-shell pane) and samples idle memory + CPU. Throughput/startup-in-pane are therefore n/a for native until the GUI wires CLI launch (the parser + single-instance gate already live in
core+ the headless daemon).
The harness is plain Node, no dependencies. Run it via node from the repo root, or
via the npm scripts in bench/ (run them from inside bench/):
# 1. See what's built/installed (writes bench/results/terminals.json)
node bench/detect.mjs # …or: cd bench; npm run bench:detect
# 2. The headline native-vs-Electron(-vs-WT) idle comparison
node bench/run.mjs --only=hyperpanes,hyperpanes-electron,wt --suite=memory --idle-bare --label=native-vs-electronnpm run bench with no flags benchmarks every detected terminal with all applicable
suites at --runs=5. Output lands in bench/results/ (gitignored).
The native app has no hyperpanes command on PATH; the harness resolves the cargo
build output. Build it first:
cargo build --release --manifest-path rs/crates/app/Cargo.tomlIt prefers rs/crates/app/target/release/hyperpanes.exe and falls back to a debug
build (noted in the report — debug is slower/larger and not a fair comparison target).
Each measured native launch gets an isolated throwaway %APPDATA% so it starts as a
clean fresh instance (the data dir keys on %APPDATA%).
The installed production app is now the native build (Electron was retired), so the
Electron baseline is built from a worktree of archive/electron next to this repo:
git worktree add ../electron-baseline archive/electron
cd ../electron-baseline; npm ci; npm run build # → out/main/index.jsThe harness then runs it in dev mode (the worktree's electron binary + out/main/index.js),
launched with an isolated --user-data-dir, which spawns the real multi-process Electron
tree (main + GPU + renderer + utility helpers) that the proctree walk sums. If the worktree
isn't built, the hyperpanes (Electron) row is simply absent (detect shows how to build it).
| Flag | Default | Meaning |
|---|---|---|
--only=ids |
all detected | comma-separated terminal ids (hyperpanes,hyperpanes-electron,wt,wezterm,alacritty,rio,conemu,tabby,hyper,wave) |
--suite=... |
throughput,startup,memory |
which suites to run |
--runs=N |
5 |
repetitions per measurement (median kept) |
--cases=... |
all six | throughput cases (dense,scrolling,scrolling-region,alt-screen,unicode,cursor-motion) |
--bytes=MB |
16 |
throughput payload size per case |
--lines=N |
200000 |
scrollback lines for the memory "after-load" phase |
--idle-bare |
off | force every targeted terminal through the bare idle sample (no in-pane workload) — identical methodology for an apples-to-apples idle memory/CPU comparison |
--cpu-ms=N |
2000 |
idle-CPU sample window in ms (0 disables CPU sampling) |
--label=name |
report |
names the <label>.json output |
--check-updates |
off | also report (not apply) winget upgrade status |
- Throughput (MB/s, higher is better). A Node reimplementation of vtebench's streaming model (vtebench itself is Rust+bash, WSL-only). The workload runs inside the terminal, streams a fixed payload to stdout honoring PTY backpressure, and times itself — because a write blocks until the terminal drains/renders, elapsed ≈ render throughput. Not byte-identical to vtebench; a representative, internally-consistent proxy.
- Startup (ms, lower is better). "Process launch → your command is executing in a
pane." The harness stamps
t0immediately before spawning; the probe stampsDate.now()the instant it runs. The constant Node-start cost cancels across terminals. - Memory (MB, lower is better). A
Win32_Processtree-walk from the spawned root PID, summing Working Set (primary) and Private Bytes (overhead). For the native app and the Electron baseline this is an idle sample of a fresh instance (one default-shell pane); for driven terminals it is sampled at idle and again after filling scrollback. The tree-walk captures Electron's multi-process tree (main + GPU + renderer + utility) and the native app's single process — so the comparison is total resident cost, not just the main process. - Idle CPU (%, lower is better). Each process's total processor time is diffed over a
fixed window (
--cpu-ms, default 2 s) and summed across the tree, expressed as percent of one core (so it can exceed 100). A short idle snapshot — sensitive to background animation/rendering — not a sustained average.
Terminals receive the workload differently. -e-style terminals take an argv array;
hyperpanes takes a single command string it re-parses through a shell. To get one
reliable quoting path, the harness writes a per-run .cmd wrapper (cmd-native quoting
is robust for the spaces-in-path node.exe) and every terminal runs cmd /c <wrapper>. The wrapper invokes the harness's own Node (process.execPath) so every
terminal runs an identical interpreter.
- Native is idle-only. The native GUI ignores CLI argv and has no run-a-command flag, so it can't be driven; only idle memory/CPU of a fresh instance are measured. Throughput/startup are n/a for native until the GUI wires CLI launch.
- Isolated fresh instances. Each hyperpanes launch uses an isolated data dir
(native: throwaway
%APPDATA%; Electron:--user-data-dir <temp>) so it never hands off to a running copy. The Electron baseline is thearchive/electronworktree run in dev mode — also idle-only, so the comparison is apples-to-apples. - Memory tree-walk. Sums Working Set + Private Bytes from the spawned root PID across
the whole tree (Electron's helpers included). May miss a reused host process —
Windows Terminal shares one host across windows and
wt.exeis a launcher stub that exits, so its row is often empty/unreliable (shown with a note) — or include unrelated windows. - Idle CPU is a short snapshot (
--cpu-ms), percent-of-one-core summed over the tree, not a sustained average. - Backpressure proxy. For driven terminals, a terminal that buffers a large PTY read can ack bytes before rendering, under-reporting render cost. vtebench shares this.
- Config-only terminals. Tabby, Hyper and Wave have no run-a-command flag → idle memory/CPU only.
- Latency is manual (Typometer — see below), not automated.
- Kitty / Ghostty have no Windows build and are excluded.
- Environment. Run on AC power with other apps closed. Results remain machine- and load-dependent.
Input latency (keypress → glyph on screen) is not automated here. To measure it:
- Install Typometer (a screen-capture keystroke-to-pixel latency tool).
- Open the terminal under test, maximized, with a shell at an empty prompt and a high-contrast color scheme (Typometer detects glyph changes by luminance).
- In Typometer, run a measurement of ~200 keystrokes; record the reported average and p99 latency (ms).
- Repeat per terminal under identical window size / font / theme, on AC power with other apps closed.
- Record the figures alongside this harness's
report.md(Typometer has no headless CLI to fold into the automated run).
terminals.json— detection snapshot (installed y/n + versions).<label>.json— full structured results for a run.report.md— human-readable tables + caveats footer.