test(app): add PR0 perf baseline probe#607
Conversation
📝 WalkthroughWalkthroughAdds a Playwright-based perf probe (browser instrumentation, summarization utilities, unit tests), four serial E2E baseline scenarios that collect 3 runs each and write aggregated JSON, a CI workflow to run and upload perf artifacts, and SSE multi-stage streaming support for test fixtures. ChangesPerf probe measurement pipeline
SSE multi-stage streaming
sequenceDiagram
participant Runner as Playwright Runner
participant Page as Browser Page
participant Probe as __pawwork_perf_probe
participant Storage as Artifact Upload
Runner->>Page: installPerfProbe
Runner->>Page: perform scenario actions (navigate/stream/expand/scroll)
Runner->>Page: settleFrames
Runner->>Page: snapshotPerfProbe
Page-->>Runner: PerfRunSummary
Runner->>Runner: aggregatePerfRuns (3 runs per scenario)
Runner->>Storage: upload perf JSON + report + test results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Suggested priority: P2 (includes user-path files (packages/app/src/testing/perf-metrics.test.ts, packages/app/src/testing/perf-metrics.ts)).
P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/app/e2e/perf/probe.ts`:
- Line 18: The reset(label?: string) parameter is declared but never used;
remove the unused label parameter from the type/signature and helper so it
becomes reset(): void (update the declaration where reset: (label?: string) =>
void is defined and the resetPerfProbe function that currently accepts/passes
label), and then update all call sites (e.g., in perf-probe.spec.ts) to stop
passing a label argument. Ensure only the symbol names reset and resetPerfProbe
are changed to the no-arg signature so callers compile cleanly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 733706bc-d0cc-4981-9b2c-982e3cb316ee
📒 Files selected for processing (7)
.github/workflows/perf-probe-baseline.ymlpackages/app/.gitignorepackages/app/e2e/perf/perf-probe.spec.tspackages/app/e2e/perf/probe.tspackages/app/package.jsonpackages/app/src/testing/perf-metrics.test.tspackages/app/src/testing/perf-metrics.ts
There was a problem hiding this comment.
Code Review
This pull request introduces a performance testing framework for the application, including a new performance probe utility, test scenarios for baseline generation, and metrics aggregation logic. I have reviewed the implementation and provided feedback regarding the scope of the tool-call-expand test, the unused label parameter in the performance probe reset function, and the redundancy in the performance metrics aggregation logic.
|
Follow-up on the review bot notes:
|
Summary
Add the PR0.1 perf-baseline slice for #600: a Playwright perf probe, four baseline scenarios, JSON baseline output, and a non-blocking artifact workflow.
Why
PR #589 regressed interaction performance badly enough that review and manual checks missed a real regression. PR0.1 installs the measurement layer first so PR0.2 can add the comparator and gate on top of a real baseline instead of guesses.
The probe is intentionally separate from
debug-bar.tsxandrenderer-diagnostics.ts. Those paths serve runtime and developer-facing diagnostics inside the product, while PR0.1 needs a test-owned sampler that runs frompage.addInitScript, stays outside the production dependency graph, and can evolve without leaking E2E-only concerns into the shipped app. At this stage reliability of the measurement path matters more than DRY. If the schema stabilizes and we prove the signal is useful, a later follow-up can extract shared probe primitives deliberately.Related Issue
Closes #600
Human Review Status
Pending. A human should make the final merge decision after reviewing the final diff and verification evidence.
Review Focus
Risk Notes
Low. This adds test-only helpers, a new perf E2E suite, and a non-required artifact workflow. The only runtime-facing change is ignoring
packages/app/e2e/perf-resultsso generated baseline JSON does not dirty the repo.How To Verify
Screenshots or Recordings
None. This PR adds perf instrumentation, E2E coverage, and CI artifact plumbing only.
Checklist
dev, and my PR title and commit messages use Conventional Commits in EnglishPR0.2 Notes
interaction_msandinteraction_ms_medianare slightly redundant in PR0.1. PR0.2 should consume*_medianand*_worstfor comparison, so no schema rename is required in this PR.branch: "dev"is intentionally hardcoded in the PR0.1 baseline artifact. PR0.2 will switch that value to an environment-driven label such asPAWWORK_PERF_BRANCH=base|headwhen the comparator lands.Summary by CodeRabbit
Tests
Chores