test: harden session scroll perf guard#635
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR extends the perf baseline suite with a new long-scroll scenario and supporting measurement infrastructure. A new ChangesLong-Scroll Performance Baseline
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Suggested priority: P3 (only low-risk paths changed (packages/app/e2e/perf/perf-probe.spec.ts, packages/app/e2e/perf/profiles.ts, packages/app/e2e/perf/profiles.unit.ts)).
P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.
There was a problem hiding this comment.
Code Review
This pull request introduces a new performance test scenario, session-scroll-reading-long, specifically for low-end profiles to measure timeline rendering and scroll performance. The implementation includes utilities for seeding long sessions, simulating wheel events, and injecting state updates via a custom composer driver. Feedback focuses on improving the reliability of polling assertions with fallback values and correcting a timing mismatch in the scroll pulse checkpoints where the final event was unreachable within the defined duration.
…-guard # Conflicts: # packages/app/e2e/perf/profiles.ts # packages/app/e2e/perf/profiles.unit.ts
Perf delta summaryComparator: pass
|
|
Review response on the heterogeneous message fixture point: I am intentionally keeping that out of this PR. The concern is technically valid: a future guard should cover real assistant/tool/reasoning/image message parts, not only long user text turns. But adding those fixtures here would change the goal of PR1 from "establish a stable long-scroll measurement baseline" to "cover every expensive message-part renderer." That makes attribution worse for the current scroll investigation because tool/reasoning/image rendering cost can introduce separate bottlenecks from the scroll controller and timeline movement path. This PR should stay focused on the current scroll bottleneck:
I am treating heterogeneous assistant/tool/reasoning/image coverage as a follow-up scenario, likely All current checks are green, including @coderabbitai review |
|
✏️ Learnings added
✅ Actions performedFull review triggered. |
Summary
Adds a low-end
session-scroll-reading-longperf scenario for the session timeline.The new scenario:
scrollTopmovement coverage.No product behavior is changed in this PR.
Why
Issue #633 is the first PR in the scroll performance governance series. The current scroll perf probe was too thin to represent the reported long-session reading jank: it used a shorter history, a single short wheel gesture, and no active dock/layout pressure.
This PR only improves measurement coverage so later PRs can make and verify product performance changes against a stronger baseline.
Review follow-up: the long scroll driver now records actual moving samples and distinct scroll positions, uses a dynamic wheel step from timeline height, expands the todo dock before measurement, and keeps wheel coordinates away from the jump-to-latest overlay.
Related Issue
Part of #633.
Human Review Status
Pending. A human should make the final merge decision after reviewing the final diff and verification evidence.
Review Focus
Please focus on whether the new scenario is narrow enough for PR1 and whether the moving-sample assertions make the probe meaningful without making it unnecessarily flaky.
Risk Notes
Low. This is test-only and touches only the Playwright perf probe plus perf profile routing. It does make the low-end perf profile heavier because the new scenario creates three long sessions and samples sustained scrolling.
How To Verify
Commands run:
Screenshots or Recordings
Not applicable. This PR adds perf coverage only and does not change visible UI behavior.
Checklist
dev, and my PR title and commit messages use Conventional Commits in EnglishSummary by CodeRabbit
Tests