test: source-level query regressions to verify benchmark detection by BrianWhitneyAI · Pull Request #740 · AllenInstitute/biofile-finder

BrianWhitneyAI · 2026-04-15T20:22:36Z

Summary

Regression test PR for the benchmark system (PR #739). Introduces five realistic source-level changes — the kind a developer might plausibly commit — to verify the benchmark detects them automatically against parquet-backed views at realistic data scales.

Because queries.ts calls the same build*SQL functions as the app, no changes to the benchmark code were needed. All regressions are picked up purely from changes to packages/core/.

Changes (all in `packages/core/`)

1. SQLBuilder.regexMatchValueInList — wrap column in LOWER() for case-insensitive matching

// Before
REGEXP_MATCHES(CAST("col" AS VARCHAR), '...')
// After
REGEXP_MATCHES(LOWER(CAST("col" AS VARCHAR)), LOWER('...'))

Forces a per-row LOWER() call on every scanned row. Affects: text_search, multi_column_filter.

2. buildDistinctValuesSQL — add ORDER BY 1 for sorted dropdown values

// Before: SELECT DISTINCT "cell_line" FROM "table"
// After:  SELECT DISTINCT "cell_line" FROM "table" ORDER BY 1

Adds a sort pass after hash-distinct. Affects: distinct_values (especially wide schema with high-cardinality columns).

3. buildFetchAnnotationsSQL — add ORDER BY column_name for predictable schema ordering

// Before: SELECT column_name, data_type FROM information_schema.columns WHERE ...
// After:  ... ORDER BY column_name

Adds a sort pass on the schema introspection result. Affects: fetch_annotations.

4. buildGetCountSQL — use COUNT(DISTINCT hidden_bff_uid) instead of COUNT(*)

// Before: SELECT COUNT(*) AS num_files FROM "table"
// After:  SELECT COUNT(DISTINCT "hidden_bff_uid") AS num_files FROM "table"

Forces full hash aggregation over all rows instead of a simple counter. Affects: count_all. Regression scales dramatically with row count — at 10M rows: 6ms → 1001ms (+16,000%).

5. buildGetFilesSQL — add MD5(CAST(hidden_bff_uid AS VARCHAR)) as secondary sort key

// Before: ORDER BY "File Size" DESC, hidden_bff_uid
// After:  ORDER BY "File Size" DESC, hidden_bff_uid, MD5(CAST("hidden_bff_uid" AS VARCHAR))

Forces MD5 computation for every row before the top-N can be returned. Affects: sort_and_paginate. At 10M rows: 2662ms → 4451ms (+67%).

Observed results (vs PR #739, latest run)

Query	Scale	Delta
`count_all`	10M rows	+16,209% ❌
`count_all`	1M rows	+2,976% ❌
`count_all`	100k rows	+208% ❌
`sort_and_paginate`	10M rows	+67% ❌
`sort_and_paginate`	1M rows	+76% ❌
`sort_and_paginate`	100k rows	+66% ❌
`distinct_values` (wide)	1M rows	+73% ❌
`text_search`	all scales	~+16-19%
`multi_column_filter`	all scales	~+7-9%
`fetch_annotations`	all scales	~+4-8%
`filter_by_size`	all scales	~0% (correctly unaffected)

Cloud queries (100k and 1M rows over HTTP) show the same regression pattern, confirming the HTTP range-request code path is also covered.

Test plan

Benchmark workflow triggered against PR feat: query benchmark system with DuckDB-WASM #739 (base) and this branch (compare)
count_all shows massive regression from COUNT(DISTINCT) at all scales
sort_and_paginate shows large regression from MD5 secondary sort at all scales
text_search and multi_column_filter show consistent ~15-19% regression from LOWER() overhead
distinct_values wide schema shows large regression from added sort pass on high-cardinality data
fetch_annotations shows modest regression from added sort pass
filter_by_size shows near-zero delta (correctly unaffected)
Cloud results at 100k and 1M rows show same regression pattern as in-memory

🤖 Generated with Claude Code

github-actions · 2026-04-15T20:24:47Z

BFF Query Benchmark Results

	Base (`740/merge`)	PR (`740/merge`)	Delta
DuckDB init	773.2ms	910.2ms	+17.7%

In-memory queries — narrow schema (p50 ms)
`fetch_annotations` @ 10,000 rows	3.18ms	24.0ms	+652.9% ❌
`fetch_annotations` @ 100,000 rows	3.89ms	24.6ms	+532.1% ❌
`fetch_annotations` @ 1,000,000 rows	2.48ms	22.8ms	+818.8% ❌
`fetch_annotations` @ 10,000,000 rows	2.38ms	22.8ms	+861.1% ❌
`count_all` @ 10,000 rows	0.91ms	21.4ms	+2247.3% ❌
`count_all` @ 100,000 rows	0.96ms	21.2ms	+2104.2% ❌
`count_all` @ 1,000,000 rows	1.01ms	21.4ms	+2018.8% ❌
`count_all` @ 10,000,000 rows	2.06ms	22.4ms	+985.2% ❌
`filter_by_size` @ 10,000 rows	1.79ms	22.1ms	+1140.6% ❌
`filter_by_size` @ 100,000 rows	2.05ms	22.3ms	+985.9% ❌
`filter_by_size` @ 1,000,000 rows	1.22ms	21.6ms	+1665.3% ❌
`filter_by_size` @ 10,000,000 rows	1.23ms	21.6ms	+1661.6% ❌
`sort_and_paginate` @ 10,000 rows	3.11ms	23.1ms	+643.3% ❌
`sort_and_paginate` @ 100,000 rows	7.60ms	26.6ms	+249.7% ❌
`sort_and_paginate` @ 1,000,000 rows	36.9ms	57.7ms	+56.2% ❌
`sort_and_paginate` @ 10,000,000 rows	313.4ms	343.9ms	+9.7%
`multi_column_filter` @ 10,000 rows	1.48ms	21.6ms	+1356.4% ❌
`multi_column_filter` @ 100,000 rows	1.32ms	21.4ms	+1524.7% ❌
`multi_column_filter` @ 1,000,000 rows	1.29ms	21.2ms	+1536.3% ❌
`multi_column_filter` @ 10,000,000 rows	1.14ms	21.3ms	+1764.0% ❌
`group_and_aggregate` @ 10,000 rows	2.14ms	22.2ms	+938.6% ❌
`group_and_aggregate` @ 100,000 rows	3.06ms	22.7ms	+640.9% ❌
`group_and_aggregate` @ 1,000,000 rows	8.41ms	29.1ms	+246.0% ❌
`group_and_aggregate` @ 10,000,000 rows	72.9ms	93.2ms	+27.7% ⚠️
`text_search` @ 10,000 rows	1.83ms	22.5ms	+1129.2% ❌
`text_search` @ 100,000 rows	2.28ms	22.0ms	+866.4% ❌
`text_search` @ 1,000,000 rows	1.53ms	21.9ms	+1331.7% ❌
`text_search` @ 10,000,000 rows	1.50ms	21.8ms	+1351.0% ❌

Cloud queries — HTTP parquet fixture (p50 ms) (network baseline: base 1.8ms, PR 1.8ms)
`fetch_annotations`	2.25ms	22.5ms	+903.3% ❌
`count_all`	1.14ms	21.4ms	+1777.6% ❌
`filter_by_size`	4.40ms	23.7ms	+437.7% ❌
`sort_and_paginate`	5.17ms	25.1ms	+384.9% ❌
`multi_column_filter`	1.54ms	21.5ms	+1297.7% ❌
`group_and_aggregate`	1.98ms	22.2ms	+1025.8% ❌
`text_search`	2.88ms	23.6ms	+719.1% ❌

Wide schema results (p50 ms)

	Base (`740/merge`)	PR (`740/merge`)	Delta
In-memory queries — wide schema (p50 ms)
`fetch_annotations` @ 10,000 rows	2.28ms	22.7ms	+894.3% ❌
`fetch_annotations` @ 100,000 rows	2.32ms	22.5ms	+869.8% ❌
`fetch_annotations` @ 1,000,000 rows	2.60ms	22.4ms	+760.3% ❌
`count_all` @ 10,000 rows	0.80ms	21.1ms	+2518.0% ❌
`count_all` @ 100,000 rows	0.79ms	21.1ms	+2577.2% ❌
`count_all` @ 1,000,000 rows	0.92ms	21.2ms	+2211.5% ❌
`filter_by_size` @ 10,000 rows	2.13ms	22.1ms	+939.7% ❌
`filter_by_size` @ 100,000 rows	1.65ms	22.0ms	+1235.0% ❌
`filter_by_size` @ 1,000,000 rows	1.84ms	22.0ms	+1093.5% ❌
`sort_and_paginate` @ 10,000 rows	2.13ms	22.6ms	+961.7% ❌
`sort_and_paginate` @ 100,000 rows	6.02ms	26.8ms	+344.6% ❌
`sort_and_paginate` @ 1,000,000 rows	41.4ms	61.7ms	+49.2% ⚠️
`multi_column_filter` @ 10,000 rows	1.36ms	21.7ms	+1491.9% ❌
`multi_column_filter` @ 100,000 rows	1.34ms	21.9ms	+1534.7% ❌
`multi_column_filter` @ 1,000,000 rows	1.67ms	21.7ms	+1204.5% ❌
`group_and_aggregate` @ 10,000 rows	4.90ms	24.5ms	+399.0% ❌
`group_and_aggregate` @ 100,000 rows	23.2ms	42.0ms	+80.7% ❌
`group_and_aggregate` @ 1,000,000 rows	231.1ms	249.0ms	+7.7%
`text_search` @ 10,000 rows	2.10ms	22.5ms	+973.3% ❌
`text_search` @ 100,000 rows	1.93ms	22.2ms	+1051.2% ❌
`text_search` @ 1,000,000 rows	1.90ms	22.2ms	+1066.1% ❌

p95 timings (narrow schema)

Query	Scale	Base p95	PR p95	Delta
`fetch_annotations`	10,000	6.65ms	26.2ms	+294.1% ❌
`fetch_annotations`	100,000	4.98ms	27.2ms	+446.4% ❌
`fetch_annotations`	1,000,000	2.86ms	23.6ms	+727.5% ❌
`fetch_annotations`	10,000,000	2.49ms	23.8ms	+856.7% ❌
`count_all`	10,000	0.96ms	21.6ms	+2160.2% ❌
`count_all`	100,000	1.26ms	21.5ms	+1603.2% ❌
`count_all`	1,000,000	1.63ms	22.1ms	+1254.4% ❌
`count_all`	10,000,000	2.57ms	22.8ms	+787.0% ❌
`filter_by_size`	10,000	2.00ms	22.9ms	+1044.5% ❌
`filter_by_size`	100,000	3.38ms	22.6ms	+569.2% ❌
`filter_by_size`	1,000,000	1.49ms	22.0ms	+1378.8% ❌
`filter_by_size`	10,000,000	1.60ms	22.2ms	+1281.6% ❌
`sort_and_paginate`	10,000	6.19ms	25.1ms	+305.9% ❌
`sort_and_paginate`	100,000	9.73ms	27.3ms	+180.1% ❌
`sort_and_paginate`	1,000,000	38.0ms	64.6ms	+69.8% ❌
`sort_and_paginate`	10,000,000	316.2ms	344.7ms	+9.0%
`multi_column_filter`	10,000	2.18ms	22.0ms	+907.3% ❌
`multi_column_filter`	100,000	2.17ms	22.7ms	+947.9% ❌
`multi_column_filter`	1,000,000	1.63ms	21.7ms	+1226.0% ❌
`multi_column_filter`	10,000,000	1.74ms	21.4ms	+1127.0% ❌
`group_and_aggregate`	10,000	2.81ms	22.8ms	+709.6% ❌
`group_and_aggregate`	100,000	4.64ms	24.3ms	+423.1% ❌
`group_and_aggregate`	1,000,000	9.39ms	33.2ms	+253.6% ❌
`group_and_aggregate`	10,000,000	88.7ms	95.2ms	+7.3%
`text_search`	10,000	4.37ms	23.0ms	+427.6% ❌
`text_search`	100,000	2.60ms	23.3ms	+793.3% ❌
`text_search`	1,000,000	2.36ms	22.9ms	+870.8% ❌
`text_search`	10,000,000	2.10ms	22.4ms	+963.9% ❌

Summary

27 regressions (≥25% slower):

❌ count_all @ 10,000 rows: 0.91ms → 21.4ms (+2247.3%)
❌ count_all @ 100,000 rows: 0.96ms → 21.2ms (+2104.2%)
❌ count_all @ 1,000,000 rows: 1.01ms → 21.4ms (+2018.8%)
❌ multi_column_filter @ 10,000,000 rows: 1.14ms → 21.3ms (+1764.0%)
❌ filter_by_size @ 1,000,000 rows: 1.22ms → 21.6ms (+1665.3%)
❌ filter_by_size @ 10,000,000 rows: 1.23ms → 21.6ms (+1661.6%)
❌ multi_column_filter @ 1,000,000 rows: 1.29ms → 21.2ms (+1536.3%)
❌ multi_column_filter @ 100,000 rows: 1.32ms → 21.4ms (+1524.7%)
❌ multi_column_filter @ 10,000 rows: 1.48ms → 21.6ms (+1356.4%)
❌ text_search @ 10,000,000 rows: 1.50ms → 21.8ms (+1351.0%)
❌ text_search @ 1,000,000 rows: 1.53ms → 21.9ms (+1331.7%)
❌ filter_by_size @ 10,000 rows: 1.79ms → 22.1ms (+1140.6%)
❌ text_search @ 10,000 rows: 1.83ms → 22.5ms (+1129.2%)
❌ filter_by_size @ 100,000 rows: 2.05ms → 22.3ms (+985.9%)
❌ count_all @ 10,000,000 rows: 2.06ms → 22.4ms (+985.2%)
❌ group_and_aggregate @ 10,000 rows: 2.14ms → 22.2ms (+938.6%)
❌ text_search @ 100,000 rows: 2.28ms → 22.0ms (+866.4%)
❌ fetch_annotations @ 10,000,000 rows: 2.38ms → 22.8ms (+861.1%)
❌ fetch_annotations @ 1,000,000 rows: 2.48ms → 22.8ms (+818.8%)
❌ fetch_annotations @ 10,000 rows: 3.18ms → 24.0ms (+652.9%)
❌ sort_and_paginate @ 10,000 rows: 3.11ms → 23.1ms (+643.3%)
❌ group_and_aggregate @ 100,000 rows: 3.06ms → 22.7ms (+640.9%)
❌ fetch_annotations @ 100,000 rows: 3.89ms → 24.6ms (+532.1%)
❌ sort_and_paginate @ 100,000 rows: 7.60ms → 26.6ms (+249.7%)
❌ group_and_aggregate @ 1,000,000 rows: 8.41ms → 29.1ms (+246.0%)
❌ sort_and_paginate @ 1,000,000 rows: 36.9ms → 57.7ms (+56.2%)
⚠️ group_and_aggregate @ 10,000,000 rows: 72.9ms → 93.2ms (+27.7%)

Benchmarks run in headless Chromium with DuckDB-WASM. Each query: 1 warm-up + 10 timed iterations. Flags: ⚠️ ≥25% slower · ❌ ≥50% slower · ✅ ≥10% faster

Three realistic changes a developer might plausibly commit, each targeting a different benchmark query: 1. SQLBuilder.regexMatchValueInList — wrap column in LOWER() for case-insensitive matching. Forces a per-row function call on every scanned row. Affects: text_search, multi_column_filter. 2. buildDistinctValuesSQL — add ORDER BY 1 so dropdown values come back sorted. Adds a sort pass after hash-distinct. Affects: distinct_values. 3. buildFetchAnnotationsSQL — add ORDER BY column_name for predictable schema column ordering. Adds a sort pass on the result set. Affects: fetch_annotations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add two more expensive query patterns to the slow-test branch: - COUNT(DISTINCT hidden_bff_uid) instead of COUNT(*) — forces full hash aggregation over all rows, significantly slower at large scales - MD5(CAST(hidden_bff_uid AS VARCHAR)) as secondary sort key in buildGetFilesSQL — forces MD5 computation per row for every paginated query, impacting sort_and_paginate, filter_by_size, multi_column_filter, and text_search Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

BrianWhitneyAI force-pushed the feature/query-benchmark-slow-test branch 5 times, most recently from a09f226 to f7b151a Compare April 15, 2026 21:31

BrianWhitneyAI mentioned this pull request Apr 15, 2026

feat: query benchmark system with DuckDB-WASM #739

Merged

BrianWhitneyAI force-pushed the feature/query-benchmark-slow-test branch from bfcf726 to f44b3ad Compare April 15, 2026 21:51

BrianWhitneyAI changed the title ~~test: artificial query delay to verify benchmark regression detection~~ test: source-level query regressions to verify benchmark detection Apr 15, 2026

BrianWhitneyAI force-pushed the feature/query-benchmark-slow-test branch 5 times, most recently from 241d0de to 2cf6f30 Compare April 16, 2026 00:50

BrianWhitneyAI force-pushed the feature/query-benchmark-slow-test branch from 2cf6f30 to a4f2ad7 Compare April 22, 2026 23:26

BrianWhitneyAI and others added 2 commits April 23, 2026 12:37

BrianWhitneyAI force-pushed the feature/query-benchmark-slow-test branch from a4f2ad7 to f05d1fb Compare April 23, 2026 19:37

BrianWhitneyAI closed this Apr 27, 2026

BrianWhitneyAI deleted the feature/query-benchmark-slow-test branch April 27, 2026 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: source-level query regressions to verify benchmark detection#740

test: source-level query regressions to verify benchmark detection#740
BrianWhitneyAI wants to merge 2 commits into
feature/query-benchmarkfrom
feature/query-benchmark-slow-test

BrianWhitneyAI commented Apr 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BrianWhitneyAI commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes (all in packages/core/)

Observed results (vs PR #739, latest run)

Test plan

Uh oh!

github-actions Bot commented Apr 15, 2026

BFF Query Benchmark Results

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BrianWhitneyAI commented Apr 15, 2026 •

edited

Loading

Changes (all in `packages/core/`)