Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
df902b5
poc commit
BrianWhitneyAI Apr 9, 2026
34bc28f
feat: query benchmark system with schema variation and cloud fixture
BrianWhitneyAI Apr 15, 2026
378bf71
Switch benchmark workflow to manual workflow_dispatch
BrianWhitneyAI Apr 15, 2026
2931ebc
refactor: wire benchmark queries to actual service SQL builders
BrianWhitneyAI Apr 15, 2026
521e3db
fix: pass waitForFunction timeout as options arg, not page function arg
BrianWhitneyAI Apr 15, 2026
2352424
fix: use round-robin shuffled timing to eliminate cache dilution betw…
BrianWhitneyAI Apr 15, 2026
7868f2d
chore: use branch names in job titles and comparison table headers
BrianWhitneyAI Apr 15, 2026
9149e11
fix: use BENCHMARK_BRANCH instead of GITHUB_REF_NAME in workflow
BrianWhitneyAI Apr 15, 2026
25e267e
fix: run both benchmarks sequentially on the same CI runner
BrianWhitneyAI Apr 15, 2026
c565ae0
refactor: benchmark against parquet views to match production query path
BrianWhitneyAI Apr 15, 2026
784919e
fix: copy fixture buffer before registerFileBuffer transfers ownership
BrianWhitneyAI Apr 15, 2026
55b6c7b
fix: re-export fixture from view instead of reusing transferred buffer
BrianWhitneyAI Apr 15, 2026
36f86c0
feat: drop 10k scale, expand cloud benchmark to all scales
BrianWhitneyAI Apr 15, 2026
b454a4e
fix: cap cloud fixtures at 1M rows to avoid CDP transfer limit
BrianWhitneyAI Apr 16, 2026
b3d1449
docs: explain why benchmark runs sequentially on the same VM
BrianWhitneyAI Apr 16, 2026
794c039
feat: service-layer task benchmark with query timing instrumentation
BrianWhitneyAI Apr 22, 2026
f49eb3d
fix: move fixture download after checkout, add cache
BrianWhitneyAI Apr 22, 2026
4d209f5
feat: accurate DuckDB-internal query timing for benchmark
BrianWhitneyAI Apr 23, 2026
fb85966
cleanup
BrianWhitneyAI Apr 24, 2026
a0b20a8
merge: resolve conflicts with main
BrianWhitneyAI Apr 24, 2026
ee5c03f
cleanup
BrianWhitneyAI Apr 27, 2026
5bd2bbd
benchmark documentation
BrianWhitneyAI Apr 27, 2026
794eba8
add datetime filter, update badge thresholds
BrianWhitneyAI Apr 27, 2026
e8af72d
Merge branch 'main' into feature/query-benchmark
BrianWhitneyAI Apr 27, 2026
47bada0
Merge branch 'main' into feature/query-benchmark
BrianWhitneyAI Apr 29, 2026
6a8f88a
Merge branch 'main' into feature/query-benchmark
BrianWhitneyAI May 4, 2026
cef7f2b
Update packages/web/src/services/DatabaseServiceWeb/duckdb-worker.wor…
BrianWhitneyAI May 6, 2026
13a5fab
Update packages/web/src/services/DatabaseServiceWeb/duckdb-worker.wor…
BrianWhitneyAI May 6, 2026
5626d91
Update packages/web/src/services/DatabaseServiceWeb/duckdb-worker.wor…
BrianWhitneyAI May 6, 2026
8d7b4ef
Merge branch 'main' into feature/query-benchmark
BrianWhitneyAI May 6, 2026
86d73a5
Merge branch 'main' into feature/query-benchmark
BrianWhitneyAI May 8, 2026
ba763a0
Remove unnecessary service exports and add benchmark design rationale
BrianWhitneyAI May 8, 2026
f917553
Fix forbidden non-null assertions in benchmark timing loop
BrianWhitneyAI May 8, 2026
a3bfab3
Merge branch 'main' into feature/query-benchmark
BrianWhitneyAI May 8, 2026
78d8afa
Merge branch 'main' into feature/query-benchmark
BrianWhitneyAI May 21, 2026
dcb78be
Update packages/web/scripts/summarize-results.js
BrianWhitneyAI May 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
name: Query Benchmark

on:
workflow_dispatch:
inputs:
base_branch:
description: "Base branch to compare against"
required: false
type: string
default: "main"
compare_branch:
description: "Branch to benchmark"
required: true
type: string
iterations:
description: "Timed iterations per task (default 5)"
required: false
type: string
default: "5"
warmup:
description: "Warmup rounds before timing (default 1)"
required: false
type: string
default: "1"

permissions:
contents: read

jobs:
benchmark:
name: "Regression (${{ github.event.inputs.base_branch }} vs ${{ github.event.inputs.compare_branch }})"
runs-on: ubuntu-latest
# Both branches run sequentially in a single job on the same VM. This is intentional:
# if each branch ran in its own job, GitHub could schedule them on different physical
# machines with different CPU speeds, cache sizes, or competing workloads. A ~15%
# hardware variance between VMs would mask the small regressions we actually care about.
# Running back-to-back on the same VM ensures both measurements share the same hardware
# baseline, so deltas reflect code differences only.
# 180 minutes: fixture download + full task suite (including change_grouping on 10m) × 2 branches
timeout-minutes: 180

steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.compare_branch }}

# Fixtures cached by version; downloaded once, reused by both branch runs.
# Must come after checkout so git clean -ffdx does not wipe them.
- name: Cache benchmark fixtures
id: fixture-cache
uses: actions/cache@v4
with:
path: packages/web/fixtures
key: benchmark-fixtures-v1

- name: Download benchmark fixtures
if: steps.fixture-cache.outputs.cache-hit != 'true'
run: |
mkdir -p packages/web/fixtures
BASE=https://staging-biofile-finder-datasets.s3.us-west-2.amazonaws.com/benchmark-fixtures/v1
curl -fL "$BASE/synthetic-100k.parquet" -o packages/web/fixtures/synthetic-100k.parquet
curl -fL "$BASE/synthetic-1m.parquet" -o packages/web/fixtures/synthetic-1m.parquet
curl -fL "$BASE/synthetic-10m.parquet" -o packages/web/fixtures/synthetic-10m.parquet

- uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"

- name: Install dependencies
run: npm ci

- name: Install Playwright Chromium
run: npx playwright install chromium --with-deps
working-directory: packages/web

- name: Run benchmark (${{ github.event.inputs.compare_branch }})
run: node scripts/run-regression.js --iterations ${{ github.event.inputs.iterations }} --warmup ${{ github.event.inputs.warmup }}
working-directory: packages/web
env:
BENCHMARK_BRANCH: ${{ github.event.inputs.compare_branch }}

- name: Save compare branch results
run: mv packages/web/benchmark-results-*.json /tmp/benchmark-compare.json

- uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.base_branch }}
clean: false

- name: Install dependencies (base branch)
run: npm ci

- name: Run benchmark (${{ github.event.inputs.base_branch }})
run: node scripts/run-regression.js --skip-build --iterations ${{ github.event.inputs.iterations }} --warmup ${{ github.event.inputs.warmup }}
working-directory: packages/web
env:
BENCHMARK_BRANCH: ${{ github.event.inputs.base_branch }}

- name: Generate comparison
run: |
BASE_FILE=$(ls packages/web/benchmark-results-*.json | head -1)
node packages/web/scripts/compare-results.js "$BASE_FILE" /tmp/benchmark-compare.json >> "$GITHUB_STEP_SUMMARY"

- name: Upload results
if: always()
uses: actions/upload-artifact@v4
with:
name: benchmark-results
path: |
packages/web/benchmark-results-*.json
/tmp/benchmark-compare.json
retention-days: 7
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,8 @@ build
*.tgz
.env
mise.toml

# Benchmark runner output — generated by CI, not source
packages/web/benchmark-results*.json
# Generated parquet fixtures — produce with scripts/generate-fixtures.py
packages/web/fixtures/
118 changes: 118 additions & 0 deletions dev-docs/07-query-benchmarking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
Query benchmarking
==================

Three tools for measuring and monitoring DuckDB-WASM query performance.

---

Tool 1 — Local benchmark runner
--------------------------------

Runs the full task suite in headless Chromium against parquet fixtures, prints a p50/p95 timing table, and writes a result JSON for later comparison.

**First-time setup**

```bash
cd packages/web
npx playwright install chromium --with-deps
```

**Download local fixtures** (one time; ~500 MB total)

```bash
BASE=https://staging-biofile-finder-datasets.s3.us-west-2.amazonaws.com/benchmark-fixtures/v1
Comment thread
pgarrison marked this conversation as resolved.
mkdir -p packages/web/fixtures
curl -fL "$BASE/synthetic-100k.parquet" -o packages/web/fixtures/synthetic-100k.parquet
curl -fL "$BASE/synthetic-1m.parquet" -o packages/web/fixtures/synthetic-1m.parquet
curl -fL "$BASE/synthetic-10m.parquet" -o packages/web/fixtures/synthetic-10m.parquet
```

**Run against local fixtures**

```bash
# All scales
npm run benchmark --prefix packages/web -- --local

# Single scale
npm run benchmark --prefix packages/web -- --local --scale 100k

# Override iteration/warmup counts
npm run benchmark --prefix packages/web -- --local --scale 1m --iterations 10 --warmup 3
```

**Run against remote S3 parquets**

```bash
BENCHMARK_REAL_1M_URL=s3://your-bucket/file.parquet \
npm run benchmark --prefix packages/web -- --scale 1m
```

**Compare two result files**

```bash
npm run benchmark:compare --prefix packages/web -- \
packages/web/benchmark-results-main.json \
packages/web/benchmark-results-local.json
```

This prints a Markdown table with p50 deltas and regression/improvement badges (⚠️ ≥25% slower, ❌ ≥50% slower, ✅ ≥25% faster). Badges are suppressed for queries where either branch is under 500ms — percentage deltas on fast queries are noise.

**Flags**

| Flag | Description |
|---|---|
| `--local` | Use fixtures from `packages/web/fixtures/` instead of S3 URLs |
| `--scale 100k\|1m\|10m` | Run a single fixture size |
| `--full` | Run all scales with both cloud and local sources side-by-side |
| `--iterations N` | Timed iterations per task (default 5) |
| `--warmup N` | Warmup rounds before timing (default 1) |
| `--skip-build` | Skip the webpack build step |
| `--chromium` | Use Playwright's bundled Chromium instead of system Chrome |

---

Tool 2 — CI regression workflow
---------------------------------

`benchmark.yml` is a `workflow_dispatch` workflow that benchmarks two branches sequentially on the same VM and posts a Markdown comparison table to the workflow summary.

Both branches run on the same machine to eliminate hardware variance — a ~15% CPU speed difference between VMs would mask the small regressions the tool is designed to catch.

**Trigger it** from the Actions tab: select **Query Benchmark**, enter a `compare_branch` (your PR branch) and optionally override `base_branch` (default: `main`), `iterations`, and `warmup`.

The workflow:
1. Checks out the compare branch and downloads fixtures from S3 (cached by version)
2. Runs `run-regression.js` → writes `benchmark-results-<compare>.json`
3. Checks out the base branch (without wiping fixtures)
4. Runs `run-regression.js` → writes `benchmark-results-<base>.json`
5. Runs `compare-results.js` → posts the Markdown table to the step summary

---

Tool 3 — Dev console query timing
-----------------------------------

Enables per-query DuckDB timing in the running app without any build step.

**Enable**

In the browser DevTools console:

```js
localStorage.setItem("bff_query_timing", "1")
```

Then reload the page. Each DuckDB query will log its elapsed time to the console as it runs:

```
[duckdb] 12.3ms — [fetchAnnotations] SELECT DISTINCT ...
[duckdb] 4.1ms — [getFiles] SELECT * FROM ...
```

**Disable**

```js
localStorage.removeItem("bff_query_timing")
```

Then reload.
48 changes: 48 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 10 additions & 0 deletions packages/web/benchmark/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>BFF Benchmark</title>
</head>
<body>
<p id="status">Starting...</p>
</body>
</html>
Loading
Loading