Skip to content

New browser instance per benchmark task#808

Merged
pgarrison merged 9 commits into
mainfrom
feature/browser-per-benchmark-task
Jun 4, 2026
Merged

New browser instance per benchmark task#808
pgarrison merged 9 commits into
mainfrom
feature/browser-per-benchmark-task

Conversation

@pgarrison
Copy link
Copy Markdown
Contributor

@pgarrison pgarrison commented May 20, 2026

Purpose

Enable benchmarking cold-start scenarios by using a new browser instance for every trial. This feature is enabled only when the user sets warmups=0.

Builds on #739, #806,

Changes

  • The benchmark page (index.ts) accepts an optional list of tasks to run, and runs all tasks by default.
  • run-benchmark-page.ts now splits the logic into runInBrowser (for warmups > 0) and runColdStartBenchmarks, which makes use of runInBrowser.
  • Fixed a bug where I used in instead of has for set membership.
  • Extracted some logic to stats.ts (may need a better name) and injectFixtures for DRY and readability.

Testing

Without this change, running tests with warmups=0 shows a significant performance decrease for the first trial of each batch. The test below used 4 batches with 15 or 20 iterations each.

Before:
Screenshot 2026-05-28 at 2 36 19 PM

After this change, with warmups=0, there are no obvious outliers.

After (2 batches, 5 iterations each):
Screenshot 2026-05-29 at 10 53 40 AM

Comment on lines 5 to -24
@@ -14,14 +12,6 @@ function setStatus(msg: string) {
console.log("[benchmark]", msg);
}

// Nearest-rank percentile over a pre-sorted array. Used to report p50 and p95
// across timed iterations — p95 surfaces occasional slow outliers (GC pauses,
// DuckDB cache misses) that the median would hide.
function percentile(sorted: number[], p: number): number {
const idx = Math.ceil((p / 100) * sorted.length) - 1;
return sorted[Math.max(0, idx)];
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were moved to stats.ts for reuse.

};
} finally {
await browser.close();
await new Promise((res) => server.close(res));
Copy link
Copy Markdown
Contributor Author

@pgarrison pgarrison May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved to line 182

// no HTTP range-request overhead, so DuckDB sort performance matches real-user timing.
const loaded = new Set();
for (const source of testCases.flat()) {
if (source.label in loaded) continue; // Don't add duplicate sources
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section was moved verbatim to injectFixtures, with one exception: I changed this condition to loaded.has(source.label) to fix a bug. (in does not check set membership!)

@pgarrison pgarrison mentioned this pull request May 28, 2026
@pgarrison pgarrison marked this pull request as ready for review May 29, 2026 17:55
Base automatically changed from feature/benchmark-aggregate-parquet to main May 29, 2026 20:06
Copy link
Copy Markdown
Contributor

@BrianWhitneyAI BrianWhitneyAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved under the assumption the lint issues are cleaned up

Comment thread packages/web/scripts/lib/run-benchmark-page.ts
@pgarrison
Copy link
Copy Markdown
Contributor Author

Waiting on https://github.com/AllenInstitute/biofile-finder/actions/runs/26973097008 to complete successfully after latest adjustments

@pgarrison
Copy link
Copy Markdown
Contributor Author

Approved under the assumption the lint issues are cleaned up

Done 4576877

@pgarrison pgarrison merged commit 1dde5ac into main Jun 4, 2026
8 of 9 checks passed
@pgarrison pgarrison deleted the feature/browser-per-benchmark-task branch June 4, 2026 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants