Skip to content

Add test cases: aggregate 10M+10M, 20M#806

Merged
pgarrison merged 9 commits into
mainfrom
feature/benchmark-aggregate-parquet
May 29, 2026
Merged

Add test cases: aggregate 10M+10M, 20M#806
pgarrison merged 9 commits into
mainfrom
feature/benchmark-aggregate-parquet

Conversation

@pgarrison
Copy link
Copy Markdown
Contributor

@pgarrison pgarrison commented May 20, 2026

Purpose

Test query performance of large aggregate parquets.

Changes

  • Two new benchmark cases: 10M+10M aggregate, and 20M.
    • This required substantial refactoring of the query benchmarking, including both the config data format and results data format
  • TypeScript migration
    • Converted run-local.js, run-regression.js, compare-results.js, and run-benchmark-page.js to .ts with proper type annotations.
    • Added tsx as a devDependency to run TS scripts directly.
    • Updated package.json scripts to use tsx instead of node.
  • CI / workflow updates
    • Doubled timeout to 360 minutes to accommodate larger tests (20 iterations takes 5+ hours).
    • Uses npm run benchmark:* scripts instead of direct node calls.
  • Cleanup uses deleteDataSourceWrapper instead of raw DROP VIEW.

Testing

  • I used these changes thoroughly for the benchmark results presented in Support aggregate parquet data sources #787, but I've had to do some merging and rebasing since then. This run shows the benchmark results for the latest version.
  • I also did some manual tests for npm run benchmark and npm run benchmark:summary with --warmups=0

Reviewing

Note: the base branch is a temporary development branch, the result of merging PRs #739 and #787, both dependencies of this branch.

@pgarrison pgarrison force-pushed the feature/benchmark-aggregate-parquet branch from f923aa3 to 15eeadf Compare May 20, 2026 19:07
@pgarrison pgarrison marked this pull request as ready for review May 20, 2026 19:15
Comment thread .github/workflows/benchmark.yml
Comment thread dev-docs/07-query-benchmarking.md Outdated
Copy link
Copy Markdown
Contributor

@BrianWhitneyAI BrianWhitneyAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that this aggregate benchmark is too much to put on top of the existing benchmark and should be split out into separate scripts.

Comment thread packages/web/scripts/lib/run-benchmark-page.ts
Comment thread packages/web/scripts/run-regression.ts
Comment thread package.json Outdated
"resize-observer": "1.0.x",
"rimraf": "3.0.x",
"sinon": "12.x",
"tsx": "^4.21.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the change around the locks?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just slightly worried about dep changes for code that isnt directly used by the app. This was more a comment about those changes as a whole rather than specifically "tsx"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah seeing its in the dev dependecies in the comment below makes me less worried about this

@pgarrison pgarrison changed the base branch from aggregate-parquet-query-benchmarking-merged to main May 28, 2026 00:00
Comment thread package.json Outdated
"resize-observer": "1.0.x",
"rimraf": "3.0.x",
"sinon": "12.x",
"tsx": "^4.21.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The versions are handled via the "x" version classifier as opposed to ~ or ^.

Bummer this has to be included in the web deploy

Suggested change
"tsx": "^4.21.0",
"tsx": "4.21.x",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the dev dependencies, so it shouldn't be in the deployment? Perhaps I'm misunderstanding.

(Updated the version syntax in 6079147)

pgarrison and others added 2 commits May 28, 2026 13:33
Co-authored-by: Sean Du'Hare <41307451+SeanDuHare@users.noreply.github.com>
@pgarrison
Copy link
Copy Markdown
Contributor Author

@pgarrison pgarrison requested a review from BrianWhitneyAI May 28, 2026 22:03
Copy link
Copy Markdown
Contributor

@BrianWhitneyAI BrianWhitneyAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing my bad error commit!

@pgarrison pgarrison merged commit 97697a2 into main May 29, 2026
8 checks passed
@pgarrison
Copy link
Copy Markdown
Contributor Author

@pgarrison pgarrison deleted the feature/benchmark-aggregate-parquet branch May 29, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants