Skip to content

add quipbench :)

630bd0c
Select commit
Loading
Failed to load commit list.
Open

Add QuipBench :) #10

add quipbench :)
630bd0c
Select commit
Loading
Failed to load commit list.
MacroscopeApp / Macroscope - Correctness Check completed Feb 23, 2026 in 6m 38s

4 issues identified (71 code objects reviewed).

• Merge Base: 797d053
• Head: 630bd0c

Details

File Path Comments Posted
bench/README.md 0
bench/config.ts 0
bench/dashboard/app.js 2
bench/dashboard/index.html 0
bench/dashboard/styles.css 0
bench/db.ts 0
bench/elo.ts 0
bench/export.ts 0
bench/finalize-partial.ts 0
bench/leaderboard.ts 0
bench/models.ts 0
bench/open.ts 1
bench/run.ts 1
bench/types.ts 0

Filtered Issues Details

bench/dashboard/app.js
  • line 6: In logoFor, calling name.includes() will throw a TypeError if name is null or undefined. If row.modelName could ever be missing from a leaderboard entry, this would crash when called from rowHtml on line 24. [ Low confidence ]
  • line 34: In rowHtml, row.elo.toFixed(2) on line 34 will throw a TypeError if row.elo is undefined or null. There is no validation that each row object in the leaderboard array has the expected numeric elo property. [ Low confidence ]
  • line 37: In rowHtml, row.winRate.toFixed(2) on line 37 will throw a TypeError if row.winRate is undefined or null. There is no validation that each row object has the expected numeric winRate property. [ Low confidence ]
bench/db.ts
  • line 233: The replaceRatings function executes a DELETE operation immediately on line 233, which is outside the transaction scope defined at line 257. In bun:sqlite, statements outside a transaction block are auto-committed. If the subsequent insertion transaction tx(leaderboard) fails (e.g., due to a constraint violation, system crash, or error within stmt.run), the deletions will have already been permanently committed while the new insertions are rolled back. This breaks atomicity and results in data loss (an empty ratings list) for the given runId. The DELETE statement should be executed inside the transaction callback. [ Skipped comment generation ]