Skip to content

test: source-level query regressions to verify benchmark detection#740

Closed
BrianWhitneyAI wants to merge 2 commits into
feature/query-benchmarkfrom
feature/query-benchmark-slow-test
Closed

test: source-level query regressions to verify benchmark detection#740
BrianWhitneyAI wants to merge 2 commits into
feature/query-benchmarkfrom
feature/query-benchmark-slow-test

Conversation

@BrianWhitneyAI
Copy link
Copy Markdown
Contributor

@BrianWhitneyAI BrianWhitneyAI commented Apr 15, 2026

Summary

Regression test PR for the benchmark system (PR #739). Introduces five realistic source-level changes — the kind a developer might plausibly commit — to verify the benchmark detects them automatically against parquet-backed views at realistic data scales.

Because queries.ts calls the same build*SQL functions as the app, no changes to the benchmark code were needed. All regressions are picked up purely from changes to packages/core/.

Changes (all in packages/core/)

1. SQLBuilder.regexMatchValueInList — wrap column in LOWER() for case-insensitive matching

// Before
REGEXP_MATCHES(CAST("col" AS VARCHAR), '...')
// After
REGEXP_MATCHES(LOWER(CAST("col" AS VARCHAR)), LOWER('...'))

Forces a per-row LOWER() call on every scanned row. Affects: text_search, multi_column_filter.

2. buildDistinctValuesSQL — add ORDER BY 1 for sorted dropdown values

// Before: SELECT DISTINCT "cell_line" FROM "table"
// After:  SELECT DISTINCT "cell_line" FROM "table" ORDER BY 1

Adds a sort pass after hash-distinct. Affects: distinct_values (especially wide schema with high-cardinality columns).

3. buildFetchAnnotationsSQL — add ORDER BY column_name for predictable schema ordering

// Before: SELECT column_name, data_type FROM information_schema.columns WHERE ...
// After:  ... ORDER BY column_name

Adds a sort pass on the schema introspection result. Affects: fetch_annotations.

4. buildGetCountSQL — use COUNT(DISTINCT hidden_bff_uid) instead of COUNT(*)

// Before: SELECT COUNT(*) AS num_files FROM "table"
// After:  SELECT COUNT(DISTINCT "hidden_bff_uid") AS num_files FROM "table"

Forces full hash aggregation over all rows instead of a simple counter. Affects: count_all. Regression scales dramatically with row count — at 10M rows: 6ms → 1001ms (+16,000%).

5. buildGetFilesSQL — add MD5(CAST(hidden_bff_uid AS VARCHAR)) as secondary sort key

// Before: ORDER BY "File Size" DESC, hidden_bff_uid
// After:  ORDER BY "File Size" DESC, hidden_bff_uid, MD5(CAST("hidden_bff_uid" AS VARCHAR))

Forces MD5 computation for every row before the top-N can be returned. Affects: sort_and_paginate. At 10M rows: 2662ms → 4451ms (+67%).

Observed results (vs PR #739, latest run)

Query Scale Delta
count_all 10M rows +16,209% ❌
count_all 1M rows +2,976% ❌
count_all 100k rows +208% ❌
sort_and_paginate 10M rows +67% ❌
sort_and_paginate 1M rows +76% ❌
sort_and_paginate 100k rows +66% ❌
distinct_values (wide) 1M rows +73% ❌
text_search all scales ~+16-19%
multi_column_filter all scales ~+7-9%
fetch_annotations all scales ~+4-8%
filter_by_size all scales ~0% (correctly unaffected)

Cloud queries (100k and 1M rows over HTTP) show the same regression pattern, confirming the HTTP range-request code path is also covered.

Test plan

  • Benchmark workflow triggered against PR feat: query benchmark system with DuckDB-WASM #739 (base) and this branch (compare)
  • count_all shows massive regression from COUNT(DISTINCT) at all scales
  • sort_and_paginate shows large regression from MD5 secondary sort at all scales
  • text_search and multi_column_filter show consistent ~15-19% regression from LOWER() overhead
  • distinct_values wide schema shows large regression from added sort pass on high-cardinality data
  • fetch_annotations shows modest regression from added sort pass
  • filter_by_size shows near-zero delta (correctly unaffected)
  • Cloud results at 100k and 1M rows show same regression pattern as in-memory

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

BFF Query Benchmark Results

Base (740/merge) PR (740/merge) Delta
DuckDB init 773.2ms 910.2ms +17.7%
In-memory queries — narrow schema (p50 ms)
fetch_annotations @ 10,000 rows 3.18ms 24.0ms +652.9% ❌
fetch_annotations @ 100,000 rows 3.89ms 24.6ms +532.1% ❌
fetch_annotations @ 1,000,000 rows 2.48ms 22.8ms +818.8% ❌
fetch_annotations @ 10,000,000 rows 2.38ms 22.8ms +861.1% ❌
count_all @ 10,000 rows 0.91ms 21.4ms +2247.3% ❌
count_all @ 100,000 rows 0.96ms 21.2ms +2104.2% ❌
count_all @ 1,000,000 rows 1.01ms 21.4ms +2018.8% ❌
count_all @ 10,000,000 rows 2.06ms 22.4ms +985.2% ❌
filter_by_size @ 10,000 rows 1.79ms 22.1ms +1140.6% ❌
filter_by_size @ 100,000 rows 2.05ms 22.3ms +985.9% ❌
filter_by_size @ 1,000,000 rows 1.22ms 21.6ms +1665.3% ❌
filter_by_size @ 10,000,000 rows 1.23ms 21.6ms +1661.6% ❌
sort_and_paginate @ 10,000 rows 3.11ms 23.1ms +643.3% ❌
sort_and_paginate @ 100,000 rows 7.60ms 26.6ms +249.7% ❌
sort_and_paginate @ 1,000,000 rows 36.9ms 57.7ms +56.2% ❌
sort_and_paginate @ 10,000,000 rows 313.4ms 343.9ms +9.7%
multi_column_filter @ 10,000 rows 1.48ms 21.6ms +1356.4% ❌
multi_column_filter @ 100,000 rows 1.32ms 21.4ms +1524.7% ❌
multi_column_filter @ 1,000,000 rows 1.29ms 21.2ms +1536.3% ❌
multi_column_filter @ 10,000,000 rows 1.14ms 21.3ms +1764.0% ❌
group_and_aggregate @ 10,000 rows 2.14ms 22.2ms +938.6% ❌
group_and_aggregate @ 100,000 rows 3.06ms 22.7ms +640.9% ❌
group_and_aggregate @ 1,000,000 rows 8.41ms 29.1ms +246.0% ❌
group_and_aggregate @ 10,000,000 rows 72.9ms 93.2ms +27.7% ⚠️
text_search @ 10,000 rows 1.83ms 22.5ms +1129.2% ❌
text_search @ 100,000 rows 2.28ms 22.0ms +866.4% ❌
text_search @ 1,000,000 rows 1.53ms 21.9ms +1331.7% ❌
text_search @ 10,000,000 rows 1.50ms 21.8ms +1351.0% ❌
Cloud queries — HTTP parquet fixture (p50 ms) (network baseline: base 1.8ms, PR 1.8ms)
fetch_annotations 2.25ms 22.5ms +903.3% ❌
count_all 1.14ms 21.4ms +1777.6% ❌
filter_by_size 4.40ms 23.7ms +437.7% ❌
sort_and_paginate 5.17ms 25.1ms +384.9% ❌
multi_column_filter 1.54ms 21.5ms +1297.7% ❌
group_and_aggregate 1.98ms 22.2ms +1025.8% ❌
text_search 2.88ms 23.6ms +719.1% ❌
Wide schema results (p50 ms)
Base (740/merge) PR (740/merge) Delta
In-memory queries — wide schema (p50 ms)
fetch_annotations @ 10,000 rows 2.28ms 22.7ms +894.3% ❌
fetch_annotations @ 100,000 rows 2.32ms 22.5ms +869.8% ❌
fetch_annotations @ 1,000,000 rows 2.60ms 22.4ms +760.3% ❌
count_all @ 10,000 rows 0.80ms 21.1ms +2518.0% ❌
count_all @ 100,000 rows 0.79ms 21.1ms +2577.2% ❌
count_all @ 1,000,000 rows 0.92ms 21.2ms +2211.5% ❌
filter_by_size @ 10,000 rows 2.13ms 22.1ms +939.7% ❌
filter_by_size @ 100,000 rows 1.65ms 22.0ms +1235.0% ❌
filter_by_size @ 1,000,000 rows 1.84ms 22.0ms +1093.5% ❌
sort_and_paginate @ 10,000 rows 2.13ms 22.6ms +961.7% ❌
sort_and_paginate @ 100,000 rows 6.02ms 26.8ms +344.6% ❌
sort_and_paginate @ 1,000,000 rows 41.4ms 61.7ms +49.2% ⚠️
multi_column_filter @ 10,000 rows 1.36ms 21.7ms +1491.9% ❌
multi_column_filter @ 100,000 rows 1.34ms 21.9ms +1534.7% ❌
multi_column_filter @ 1,000,000 rows 1.67ms 21.7ms +1204.5% ❌
group_and_aggregate @ 10,000 rows 4.90ms 24.5ms +399.0% ❌
group_and_aggregate @ 100,000 rows 23.2ms 42.0ms +80.7% ❌
group_and_aggregate @ 1,000,000 rows 231.1ms 249.0ms +7.7%
text_search @ 10,000 rows 2.10ms 22.5ms +973.3% ❌
text_search @ 100,000 rows 1.93ms 22.2ms +1051.2% ❌
text_search @ 1,000,000 rows 1.90ms 22.2ms +1066.1% ❌
p95 timings (narrow schema)
Query Scale Base p95 PR p95 Delta
fetch_annotations 10,000 6.65ms 26.2ms +294.1% ❌
fetch_annotations 100,000 4.98ms 27.2ms +446.4% ❌
fetch_annotations 1,000,000 2.86ms 23.6ms +727.5% ❌
fetch_annotations 10,000,000 2.49ms 23.8ms +856.7% ❌
count_all 10,000 0.96ms 21.6ms +2160.2% ❌
count_all 100,000 1.26ms 21.5ms +1603.2% ❌
count_all 1,000,000 1.63ms 22.1ms +1254.4% ❌
count_all 10,000,000 2.57ms 22.8ms +787.0% ❌
filter_by_size 10,000 2.00ms 22.9ms +1044.5% ❌
filter_by_size 100,000 3.38ms 22.6ms +569.2% ❌
filter_by_size 1,000,000 1.49ms 22.0ms +1378.8% ❌
filter_by_size 10,000,000 1.60ms 22.2ms +1281.6% ❌
sort_and_paginate 10,000 6.19ms 25.1ms +305.9% ❌
sort_and_paginate 100,000 9.73ms 27.3ms +180.1% ❌
sort_and_paginate 1,000,000 38.0ms 64.6ms +69.8% ❌
sort_and_paginate 10,000,000 316.2ms 344.7ms +9.0%
multi_column_filter 10,000 2.18ms 22.0ms +907.3% ❌
multi_column_filter 100,000 2.17ms 22.7ms +947.9% ❌
multi_column_filter 1,000,000 1.63ms 21.7ms +1226.0% ❌
multi_column_filter 10,000,000 1.74ms 21.4ms +1127.0% ❌
group_and_aggregate 10,000 2.81ms 22.8ms +709.6% ❌
group_and_aggregate 100,000 4.64ms 24.3ms +423.1% ❌
group_and_aggregate 1,000,000 9.39ms 33.2ms +253.6% ❌
group_and_aggregate 10,000,000 88.7ms 95.2ms +7.3%
text_search 10,000 4.37ms 23.0ms +427.6% ❌
text_search 100,000 2.60ms 23.3ms +793.3% ❌
text_search 1,000,000 2.36ms 22.9ms +870.8% ❌
text_search 10,000,000 2.10ms 22.4ms +963.9% ❌

Summary

27 regressions (≥25% slower):

  • count_all @ 10,000 rows: 0.91ms → 21.4ms (+2247.3%)
  • count_all @ 100,000 rows: 0.96ms → 21.2ms (+2104.2%)
  • count_all @ 1,000,000 rows: 1.01ms → 21.4ms (+2018.8%)
  • multi_column_filter @ 10,000,000 rows: 1.14ms → 21.3ms (+1764.0%)
  • filter_by_size @ 1,000,000 rows: 1.22ms → 21.6ms (+1665.3%)
  • filter_by_size @ 10,000,000 rows: 1.23ms → 21.6ms (+1661.6%)
  • multi_column_filter @ 1,000,000 rows: 1.29ms → 21.2ms (+1536.3%)
  • multi_column_filter @ 100,000 rows: 1.32ms → 21.4ms (+1524.7%)
  • multi_column_filter @ 10,000 rows: 1.48ms → 21.6ms (+1356.4%)
  • text_search @ 10,000,000 rows: 1.50ms → 21.8ms (+1351.0%)
  • text_search @ 1,000,000 rows: 1.53ms → 21.9ms (+1331.7%)
  • filter_by_size @ 10,000 rows: 1.79ms → 22.1ms (+1140.6%)
  • text_search @ 10,000 rows: 1.83ms → 22.5ms (+1129.2%)
  • filter_by_size @ 100,000 rows: 2.05ms → 22.3ms (+985.9%)
  • count_all @ 10,000,000 rows: 2.06ms → 22.4ms (+985.2%)
  • group_and_aggregate @ 10,000 rows: 2.14ms → 22.2ms (+938.6%)
  • text_search @ 100,000 rows: 2.28ms → 22.0ms (+866.4%)
  • fetch_annotations @ 10,000,000 rows: 2.38ms → 22.8ms (+861.1%)
  • fetch_annotations @ 1,000,000 rows: 2.48ms → 22.8ms (+818.8%)
  • fetch_annotations @ 10,000 rows: 3.18ms → 24.0ms (+652.9%)
  • sort_and_paginate @ 10,000 rows: 3.11ms → 23.1ms (+643.3%)
  • group_and_aggregate @ 100,000 rows: 3.06ms → 22.7ms (+640.9%)
  • fetch_annotations @ 100,000 rows: 3.89ms → 24.6ms (+532.1%)
  • sort_and_paginate @ 100,000 rows: 7.60ms → 26.6ms (+249.7%)
  • group_and_aggregate @ 1,000,000 rows: 8.41ms → 29.1ms (+246.0%)
  • sort_and_paginate @ 1,000,000 rows: 36.9ms → 57.7ms (+56.2%)
  • ⚠️ group_and_aggregate @ 10,000,000 rows: 72.9ms → 93.2ms (+27.7%)

Benchmarks run in headless Chromium with DuckDB-WASM. Each query: 1 warm-up + 10 timed iterations. Flags: ⚠️ ≥25% slower · ❌ ≥50% slower · ✅ ≥10% faster

@BrianWhitneyAI BrianWhitneyAI force-pushed the feature/query-benchmark-slow-test branch 5 times, most recently from a09f226 to f7b151a Compare April 15, 2026 21:31
@BrianWhitneyAI BrianWhitneyAI force-pushed the feature/query-benchmark-slow-test branch from bfcf726 to f44b3ad Compare April 15, 2026 21:51
@BrianWhitneyAI BrianWhitneyAI changed the title test: artificial query delay to verify benchmark regression detection test: source-level query regressions to verify benchmark detection Apr 15, 2026
@BrianWhitneyAI BrianWhitneyAI force-pushed the feature/query-benchmark-slow-test branch 5 times, most recently from 241d0de to 2cf6f30 Compare April 16, 2026 00:50
@BrianWhitneyAI BrianWhitneyAI force-pushed the feature/query-benchmark-slow-test branch from 2cf6f30 to a4f2ad7 Compare April 22, 2026 23:26
BrianWhitneyAI and others added 2 commits April 23, 2026 12:37
Three realistic changes a developer might plausibly commit, each
targeting a different benchmark query:

1. SQLBuilder.regexMatchValueInList — wrap column in LOWER() for
   case-insensitive matching. Forces a per-row function call on every
   scanned row. Affects: text_search, multi_column_filter.

2. buildDistinctValuesSQL — add ORDER BY 1 so dropdown values come
   back sorted. Adds a sort pass after hash-distinct.
   Affects: distinct_values.

3. buildFetchAnnotationsSQL — add ORDER BY column_name for predictable
   schema column ordering. Adds a sort pass on the result set.
   Affects: fetch_annotations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two more expensive query patterns to the slow-test branch:
- COUNT(DISTINCT hidden_bff_uid) instead of COUNT(*) — forces full hash
  aggregation over all rows, significantly slower at large scales
- MD5(CAST(hidden_bff_uid AS VARCHAR)) as secondary sort key in
  buildGetFilesSQL — forces MD5 computation per row for every paginated
  query, impacting sort_and_paginate, filter_by_size, multi_column_filter,
  and text_search

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@BrianWhitneyAI BrianWhitneyAI force-pushed the feature/query-benchmark-slow-test branch from a4f2ad7 to f05d1fb Compare April 23, 2026 19:37
@BrianWhitneyAI BrianWhitneyAI deleted the feature/query-benchmark-slow-test branch April 27, 2026 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant