test: harden session scroll perf guard by Astro-Han · Pull Request #635 · Astro-Han/pawwork

Astro-Han · 2026-05-15T10:07:42Z

Summary

Adds a low-end session-scroll-reading-long perf scenario for the session timeline.

The new scenario:

Seeds a 104-turn mixed-content session.
Forces at least 80 rendered message rows before measurement.
Keeps the todo dock expanded and mutates its state during sustained scroll.
Drives full-distance down and up wheel streams using actual scrollTop movement coverage.
Keeps the 15s+ measurement window moving with mid-scroll sweep coverage instead of boundary wheel spam.
Emits the existing perf JSON summary for baseline comparison.

No product behavior is changed in this PR.

Why

Issue #633 is the first PR in the scroll performance governance series. The current scroll perf probe was too thin to represent the reported long-session reading jank: it used a shorter history, a single short wheel gesture, and no active dock/layout pressure.

This PR only improves measurement coverage so later PRs can make and verify product performance changes against a stronger baseline.

Review follow-up: the long scroll driver now records actual moving samples and distinct scroll positions, uses a dynamic wheel step from timeline height, expands the todo dock before measurement, and keeps wheel coordinates away from the jump-to-latest overlay.

Related Issue

Part of #633.

Human Review Status

Pending. A human should make the final merge decision after reviewing the final diff and verification evidence.

Review Focus

Please focus on whether the new scenario is narrow enough for PR1 and whether the moving-sample assertions make the probe meaningful without making it unnecessarily flaky.

Risk Notes

Low. This is test-only and touches only the Playwright perf probe plus perf profile routing. It does make the low-end perf profile heavier because the new scenario creates three long sessions and samples sustained scrolling.

How To Verify

Profile routing unit test: 4 passed
packages/app typecheck: passed
Diff check: no whitespace errors
Focused perf probe: 1 passed in about 1.1m
Perf JSON: profile low-end, scenario session-scroll-reading-long, runs 3, window_ms 15006.3, frame_gap_max_ms 65.7, jank_count_50ms 1, cls 0.016

Commands run:

bun test ./packages/app/e2e/perf/profiles.unit.ts
bun run typecheck
git diff --check
PLAYWRIGHT_PORT=3101 PAWWORK_PERF_PROFILE=low-end PAWWORK_PERF_BRANCH=i633-local PAWWORK_PERF_OUTPUT=/tmp/pawwork-i633-scroll.json bun run test:e2e:perf -- --project=chromium --grep "session-scroll-reading-long" --reporter=line --workers=1

Screenshots or Recordings

Not applicable. This PR adds perf coverage only and does not change visible UI behavior.

Checklist

Summary by CodeRabbit

Tests

Expanded performance testing suite with new long-scroll baseline tests for timeline operations
Enhanced scroll profiling capabilities with improved helper functions and metric tracking
Added support for testing complex scrolling scenarios with deterministic content generation

coderabbitai · 2026-05-15T10:07:50Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5aa97889-2de5-434f-bfea-d83b5d5a10b3

📥 Commits

Reviewing files that changed from the base of the PR and between 33d4034 and 7e30d1f.

📒 Files selected for processing (3)

packages/app/e2e/perf/perf-probe.spec.ts
packages/app/e2e/perf/profiles.ts
packages/app/e2e/perf/profiles.unit.ts

📝 Walkthrough

Walkthrough

This PR extends the perf baseline suite with a new long-scroll scenario and supporting measurement infrastructure. A new session-scroll-reading-long scenario is defined and wired to the low-end profile, backed by helper functions for timeline interactions and scroll coverage tracking, with baseline tests that measure extended timeline scrolling behavior.

Changes

Long-Scroll Performance Baseline

Layer / File(s)	Summary
Scenario profile definition and gating `packages/app/e2e/perf/profiles.ts`, `packages/app/e2e/perf/profiles.unit.ts`	`PerfScenarioName` union adds `"session-scroll-reading-long"`, included in `lowEndScenarios` allowlist. New test verifies scenario runs only on low-end profile.
Long-scroll perf measurement infrastructure `packages/app/e2e/perf/perf-probe.spec.ts`	Imports composer E2E driver utilities (`composerEvent`, `ComposerDriverState`, `ComposerWindow`). Introduces long-scroll tuning constants (turn counts, message thresholds, coverage/probe/sample ratios). Defines `TimelineMetrics` and `WheelRouteResult` types for scroll state and wheel-route aggregation. Implements helpers for timeline lane hover, metrics reading, long-scroll markdown content generation, session seeding, content reveal, composer driver installation/persistence, and wheel-driven scroll probing with coverage tracking.
Timeline scroll baseline tests `packages/app/e2e/perf/perf-probe.spec.ts`	Updates `session-scroll-reading` baseline to call `hoverTimelineScrollLane` helper. Adds new `session-scroll-reading-long` baseline that seeds long-scroll session, installs composer todo UI, drives wheel down/up with coverage/sample assertions, sustains movement sampling, snapshots perf output, and records baseline metrics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Astro-Han/pawwork#633: Both changes directly extend the session-scroll-reading scenario in packages/app/e2e/perf/perf-probe.spec.ts with long-scroll seeding and wheel-driving measurement routines.

Possibly related PRs

Astro-Han/pawwork#607: Both PRs extend the Playwright perf baseline suite around packages/app/e2e/perf/perf-probe.spec.ts and scenario wiring in packages/app/e2e/perf/profiles.ts, with this PR adding the session-scroll-reading-long scenario and updating the existing session-scroll-reading flow.
Astro-Han/pawwork#610: This PR's addition of the session-scroll-reading-long perf scenario wired into the lowEndScenarios profile directly uses the low-end perf-profile framework introduced in the related PR.
Astro-Han/pawwork#608: Both PRs modify packages/app/e2e/perf/perf-probe.spec.ts to change baseline behavior; PR#608 adds base/head tagging and cooldown while this PR expands helper routines and adds the new long-scroll scenario.

Poem

🐰 A long scroll through the timeline flows,
With wheels that spin and metrics that grow,
Low-end profiles now have their test,
Perf measurement put to the test! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'test: harden session scroll perf guard' is specific and clearly describes the main change: adding and hardening a performance test scenario for session scroll behavior.
Description check	✅ Passed	The PR description is comprehensive and follows the template well, including all key sections: Summary, Why, Related Issue, Human Review Status, Review Focus, Risk Notes, How To Verify with specific test results, and a completed checklist.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/i633-scroll-perf-guard

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions

Suggested priority: P3 (only low-risk paths changed (packages/app/e2e/perf/perf-probe.spec.ts, packages/app/e2e/perf/profiles.ts, packages/app/e2e/perf/profiles.unit.ts)).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

gemini-code-assist

Code Review

This pull request introduces a new performance test scenario, session-scroll-reading-long, specifically for low-end profiles to measure timeline rendering and scroll performance. The implementation includes utilities for seeding long sessions, simulating wheel events, and injecting state updates via a custom composer driver. Feedback focuses on improving the reliability of polling assertions with fallback values and correcting a timing mismatch in the scroll pulse checkpoints where the final event was unreachable within the defined duration.

…-guard # Conflicts: # packages/app/e2e/perf/profiles.ts # packages/app/e2e/perf/profiles.unit.ts

github-actions · 2026-05-15T10:43:53Z

Perf delta summary

Comparator: pass

Profile / Scenario	interaction median	interaction worst	long task max	tbt	frame gap p95	frame gap max	jank count	cls	status
default / homepage-cold	32 -> 24 (-8)	48 -> 40 (-8)	71 -> 67 (-4)	21 -> 17 (-4)	16.8 -> 16.8 (0)	133.4 -> 116.7 (-16.7)	4 -> 3 (-1)	0 -> 0 (0)	pass
default / long-session-input-lag	48 -> 48 (0)	48 -> 48 (0)	0 -> 0 (0)	0 -> 0 (0)	16.7 -> 16.7 (0)	16.7 -> 16.8 (+0.1)	0 -> 0 (0)	0 -> 0 (0)	pass
default / session-streaming-long	48 -> 56 (+8)	64 -> 64 (0)	0 -> 0 (0)	0 -> 0 (0)	16.8 -> 16.8 (0)	33.3 -> 33.3 (0)	0 -> 0 (0)	0 -> 0 (0)	pass
default / tool-call-expand	16 -> 16 (0)	24 -> 16 (-8)	0 -> 0 (0)	0 -> 0 (0)	16.7 -> 16.7 (0)	16.7 -> 16.7 (0)	0 -> 0 (0)	0 -> 0 (0)	pass
default / tool-default-open-heavy-bash	24 -> 24 (0)	32 -> 32 (0)	64 -> 65 (+1)	14 -> 15 (+1)	50 -> 50 (0)	133.3 -> 100 (-33.3)	2 -> 3 (+1)	0 -> 0 (0)	pass
default / terminal-side-panel-open	48 -> 40 (-8)	48 -> 40 (-8)	0 -> 0 (0)	0 -> 0 (0)	16.8 -> 33.3 (+16.5)	33.2 -> 33.3 (+0.1)	0 -> 0 (0)	0 -> 0 (0)	pass
default / session-scroll-reading	24 -> 24 (0)	24 -> 24 (0)	0 -> 0 (0)	0 -> 0 (0)	16.7 -> 16.7 (0)	16.7 -> 16.7 (0)	0 -> 0 (0)	0.505 -> 0.505 (0)	warn: cls
low-end / session-scroll-reading-long	56 -> 48 (-8)	64 -> 64 (0)	84 -> 79 (-5)	61 -> 53 (-8)	16.8 -> 16.8 (0)	150 -> 83.3 (-66.7)	4 -> 3 (-1)	0.016 -> 0.016 (0)	pass
low-end / session-timeline-recompute	112 -> 120 (+8)	120 -> 120 (0)	100 -> 109 (+9)	155 -> 198 (+43)	83.4 -> 99.9 (+16.5)	166.7 -> 166.5 (-0.2)	3 -> 3 (0)	0.081 -> 0.081 (0)	pass
low-end / concurrent-shimmer-extreme	0 -> 0 (0)	0 -> 0 (0)	0 -> 0 (0)	0 -> 0 (0)	16.8 -> 16.8 (0)	33.3 -> 16.8 (-16.5)	0 -> 0 (0)	0 -> 0 (0)	pass

Astro-Han · 2026-05-15T10:47:15Z

Review response on the heterogeneous message fixture point:

I am intentionally keeping that out of this PR.

The concern is technically valid: a future guard should cover real assistant/tool/reasoning/image message parts, not only long user text turns. But adding those fixtures here would change the goal of PR1 from "establish a stable long-scroll measurement baseline" to "cover every expensive message-part renderer." That makes attribution worse for the current scroll investigation because tool/reasoning/image rendering cost can introduce separate bottlenecks from the scroll controller and timeline movement path.

This PR should stay focused on the current scroll bottleneck:

long-session scroll distance and movement coverage
expanded todo dock layout pressure
15s+ low-end perf sampling
base/head comparator stability

I am treating heterogeneous assistant/tool/reasoning/image coverage as a follow-up scenario, likely session-scroll-reading-heterogeneous, after this baseline is merged. That keeps this PR reviewable and keeps later product performance PRs from mixing scroll regressions with unrelated message-part rendering costs.

All current checks are green, including perf-probe-baseline. Please re-review the remaining diff.

@coderabbitai review

coderabbitai · 2026-05-15T10:47:36Z

@Astro-Han The scoping rationale is well-reasoned. Separating scroll controller / timeline movement cost from message-part rendering cost keeps attribution clean for the current investigation, and staging session-scroll-reading-heterogeneous as a follow-up after this baseline is merged is the right sequencing. I'll re-review the remaining diff now.

✏️ Learnings added

Learnt from: Astro-Han
URL: https://github.com/Astro-Han/pawwork/pull/635

Timestamp: 2026-05-15T10:47:34.985Z
Learning: In the Pawwork repository, the perf scenario `session-scroll-reading-long` (PR `#635`) intentionally uses only long user-text turns and excludes heterogeneous assistant/tool/reasoning/image message parts. The exclusion is deliberate: mixing message-part rendering cost into the baseline would hurt attribution for the scroll controller / timeline movement investigation. A follow-up scenario named `session-scroll-reading-heterogeneous` is planned after the baseline is merged.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

✅ Actions performed

Full review triggered.

test: harden session scroll perf guard

7d55b44

github-actions Bot added the app Application behavior and product flows label May 15, 2026

github-actions Bot reviewed May 15, 2026

View reviewed changes

Astro-Han added task Maintainer or agent execution task ci Continuous integration / GitHub Actions P1 High priority labels May 15, 2026

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

Comment thread packages/app/e2e/perf/perf-probe.spec.ts Outdated

Comment thread packages/app/e2e/perf/perf-probe.spec.ts Outdated

test: stabilize session scroll perf guard

bdf8ec7

github-actions Bot removed the ci Continuous integration / GitHub Actions label May 15, 2026

Merge remote-tracking branch 'origin/dev' into codex/i633-scroll-perf…

7e30d1f

…-guard # Conflicts: # packages/app/e2e/perf/profiles.ts # packages/app/e2e/perf/profiles.unit.ts

Astro-Han added the ci Continuous integration / GitHub Actions label May 15, 2026

Astro-Han mentioned this pull request May 15, 2026

[Task] Harden session-scroll-reading perf scenario: thin coverage misses real-world stutter #633

Open

Astro-Han mentioned this pull request May 15, 2026

[Bug] Session UI typing lag and scroll stuttering #615

Open

Astro-Han merged commit 84bb3ab into dev May 15, 2026
27 checks passed

Astro-Han deleted the codex/i633-scroll-perf-guard branch May 15, 2026 11:07

Astro-Han mentioned this pull request May 16, 2026

[Task] UI rewrite v2 PR0: perf-gated regression CI #600

Closed

coderabbitai Bot mentioned this pull request May 16, 2026

test(app): add heterogeneous scroll perf fixture #664

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: harden session scroll perf guard#635

test: harden session scroll perf guard#635
Astro-Han merged 3 commits into
devfrom
codex/i633-scroll-perf-guard

Astro-Han commented May 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Astro-Han commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Astro-Han commented May 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Related Issue

Human Review Status

Review Focus

Risk Notes

How To Verify

Screenshots or Recordings

Checklist

Summary by CodeRabbit

Tests

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

Perf delta summary

Uh oh!

Astro-Han commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Astro-Han commented May 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading