Skip to content

test: harden session scroll perf guard#635

Merged
Astro-Han merged 3 commits into
devfrom
codex/i633-scroll-perf-guard
May 15, 2026
Merged

test: harden session scroll perf guard#635
Astro-Han merged 3 commits into
devfrom
codex/i633-scroll-perf-guard

Conversation

@Astro-Han
Copy link
Copy Markdown
Owner

@Astro-Han Astro-Han commented May 15, 2026

Summary

Adds a low-end session-scroll-reading-long perf scenario for the session timeline.

The new scenario:

  • Seeds a 104-turn mixed-content session.
  • Forces at least 80 rendered message rows before measurement.
  • Keeps the todo dock expanded and mutates its state during sustained scroll.
  • Drives full-distance down and up wheel streams using actual scrollTop movement coverage.
  • Keeps the 15s+ measurement window moving with mid-scroll sweep coverage instead of boundary wheel spam.
  • Emits the existing perf JSON summary for baseline comparison.

No product behavior is changed in this PR.

Why

Issue #633 is the first PR in the scroll performance governance series. The current scroll perf probe was too thin to represent the reported long-session reading jank: it used a shorter history, a single short wheel gesture, and no active dock/layout pressure.

This PR only improves measurement coverage so later PRs can make and verify product performance changes against a stronger baseline.

Review follow-up: the long scroll driver now records actual moving samples and distinct scroll positions, uses a dynamic wheel step from timeline height, expands the todo dock before measurement, and keeps wheel coordinates away from the jump-to-latest overlay.

Related Issue

Part of #633.

Human Review Status

Pending. A human should make the final merge decision after reviewing the final diff and verification evidence.

Review Focus

Please focus on whether the new scenario is narrow enough for PR1 and whether the moving-sample assertions make the probe meaningful without making it unnecessarily flaky.

Risk Notes

Low. This is test-only and touches only the Playwright perf probe plus perf profile routing. It does make the low-end perf profile heavier because the new scenario creates three long sessions and samples sustained scrolling.

How To Verify

Profile routing unit test: 4 passed
packages/app typecheck: passed
Diff check: no whitespace errors
Focused perf probe: 1 passed in about 1.1m
Perf JSON: profile low-end, scenario session-scroll-reading-long, runs 3, window_ms 15006.3, frame_gap_max_ms 65.7, jank_count_50ms 1, cls 0.016

Commands run:

bun test ./packages/app/e2e/perf/profiles.unit.ts
bun run typecheck
git diff --check
PLAYWRIGHT_PORT=3101 PAWWORK_PERF_PROFILE=low-end PAWWORK_PERF_BRANCH=i633-local PAWWORK_PERF_OUTPUT=/tmp/pawwork-i633-scroll.json bun run test:e2e:perf -- --project=chromium --grep "session-scroll-reading-long" --reporter=line --workers=1

Screenshots or Recordings

Not applicable. This PR adds perf coverage only and does not change visible UI behavior.

Checklist

  • Human review status is stated above as pending, approved, or not required
  • I linked the related issue, or stated why there is no issue
  • This PR has type, primary area, and priority labels, or I requested maintainer labeling
  • I described the review focus and any meaningful risks
  • I listed the relevant verification steps and the key result for each
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope
  • I manually checked visible UI or copy changes when needed, with screenshots or recordings
  • I considered macOS and Windows impact for platform, packaging, updater, signing, paths, shell, or permissions changes
  • I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant
  • I reviewed the final diff for unrelated changes and suspicious dependency changes
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English

Summary by CodeRabbit

Tests

  • Expanded performance testing suite with new long-scroll baseline tests for timeline operations
  • Enhanced scroll profiling capabilities with improved helper functions and metric tracking
  • Added support for testing complex scrolling scenarios with deterministic content generation

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5aa97889-2de5-434f-bfea-d83b5d5a10b3

📥 Commits

Reviewing files that changed from the base of the PR and between 33d4034 and 7e30d1f.

📒 Files selected for processing (3)
  • packages/app/e2e/perf/perf-probe.spec.ts
  • packages/app/e2e/perf/profiles.ts
  • packages/app/e2e/perf/profiles.unit.ts

📝 Walkthrough

Walkthrough

This PR extends the perf baseline suite with a new long-scroll scenario and supporting measurement infrastructure. A new session-scroll-reading-long scenario is defined and wired to the low-end profile, backed by helper functions for timeline interactions and scroll coverage tracking, with baseline tests that measure extended timeline scrolling behavior.

Changes

Long-Scroll Performance Baseline

Layer / File(s) Summary
Scenario profile definition and gating
packages/app/e2e/perf/profiles.ts, packages/app/e2e/perf/profiles.unit.ts
PerfScenarioName union adds "session-scroll-reading-long", included in lowEndScenarios allowlist. New test verifies scenario runs only on low-end profile.
Long-scroll perf measurement infrastructure
packages/app/e2e/perf/perf-probe.spec.ts
Imports composer E2E driver utilities (composerEvent, ComposerDriverState, ComposerWindow). Introduces long-scroll tuning constants (turn counts, message thresholds, coverage/probe/sample ratios). Defines TimelineMetrics and WheelRouteResult types for scroll state and wheel-route aggregation. Implements helpers for timeline lane hover, metrics reading, long-scroll markdown content generation, session seeding, content reveal, composer driver installation/persistence, and wheel-driven scroll probing with coverage tracking.
Timeline scroll baseline tests
packages/app/e2e/perf/perf-probe.spec.ts
Updates session-scroll-reading baseline to call hoverTimelineScrollLane helper. Adds new session-scroll-reading-long baseline that seeds long-scroll session, installs composer todo UI, drives wheel down/up with coverage/sample assertions, sustains movement sampling, snapshots perf output, and records baseline metrics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • Astro-Han/pawwork#633: Both changes directly extend the session-scroll-reading scenario in packages/app/e2e/perf/perf-probe.spec.ts with long-scroll seeding and wheel-driving measurement routines.

Possibly related PRs

  • Astro-Han/pawwork#607: Both PRs extend the Playwright perf baseline suite around packages/app/e2e/perf/perf-probe.spec.ts and scenario wiring in packages/app/e2e/perf/profiles.ts, with this PR adding the session-scroll-reading-long scenario and updating the existing session-scroll-reading flow.
  • Astro-Han/pawwork#610: This PR's addition of the session-scroll-reading-long perf scenario wired into the lowEndScenarios profile directly uses the low-end perf-profile framework introduced in the related PR.
  • Astro-Han/pawwork#608: Both PRs modify packages/app/e2e/perf/perf-probe.spec.ts to change baseline behavior; PR#608 adds base/head tagging and cooldown while this PR expands helper routines and adds the new long-scroll scenario.

Poem

🐰 A long scroll through the timeline flows,
With wheels that spin and metrics that grow,
Low-end profiles now have their test,
Perf measurement put to the test! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'test: harden session scroll perf guard' is specific and clearly describes the main change: adding and hardening a performance test scenario for session scroll behavior.
Description check ✅ Passed The PR description is comprehensive and follows the template well, including all key sections: Summary, Why, Related Issue, Human Review Status, Review Focus, Risk Notes, How To Verify with specific test results, and a completed checklist.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/i633-scroll-perf-guard

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the app Application behavior and product flows label May 15, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P3 (only low-risk paths changed (packages/app/e2e/perf/perf-probe.spec.ts, packages/app/e2e/perf/profiles.ts, packages/app/e2e/perf/profiles.unit.ts)).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

@Astro-Han Astro-Han added task Maintainer or agent execution task ci Continuous integration / GitHub Actions P1 High priority labels May 15, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new performance test scenario, session-scroll-reading-long, specifically for low-end profiles to measure timeline rendering and scroll performance. The implementation includes utilities for seeding long sessions, simulating wheel events, and injecting state updates via a custom composer driver. Feedback focuses on improving the reliability of polling assertions with fallback values and correcting a timing mismatch in the scroll pulse checkpoints where the final event was unreachable within the defined duration.

Comment thread packages/app/e2e/perf/perf-probe.spec.ts Outdated
Comment thread packages/app/e2e/perf/perf-probe.spec.ts Outdated
@github-actions github-actions Bot removed the ci Continuous integration / GitHub Actions label May 15, 2026
…-guard

# Conflicts:
#	packages/app/e2e/perf/profiles.ts
#	packages/app/e2e/perf/profiles.unit.ts
@github-actions
Copy link
Copy Markdown

Perf delta summary

Comparator: pass

Profile / Scenario interaction median interaction worst long task max tbt frame gap p95 frame gap max jank count cls status
default / homepage-cold 32 -> 24 (-8) 48 -> 40 (-8) 71 -> 67 (-4) 21 -> 17 (-4) 16.8 -> 16.8 (0) 133.4 -> 116.7 (-16.7) 4 -> 3 (-1) 0 -> 0 (0) pass
default / long-session-input-lag 48 -> 48 (0) 48 -> 48 (0) 0 -> 0 (0) 0 -> 0 (0) 16.7 -> 16.7 (0) 16.7 -> 16.8 (+0.1) 0 -> 0 (0) 0 -> 0 (0) pass
default / session-streaming-long 48 -> 56 (+8) 64 -> 64 (0) 0 -> 0 (0) 0 -> 0 (0) 16.8 -> 16.8 (0) 33.3 -> 33.3 (0) 0 -> 0 (0) 0 -> 0 (0) pass
default / tool-call-expand 16 -> 16 (0) 24 -> 16 (-8) 0 -> 0 (0) 0 -> 0 (0) 16.7 -> 16.7 (0) 16.7 -> 16.7 (0) 0 -> 0 (0) 0 -> 0 (0) pass
default / tool-default-open-heavy-bash 24 -> 24 (0) 32 -> 32 (0) 64 -> 65 (+1) 14 -> 15 (+1) 50 -> 50 (0) 133.3 -> 100 (-33.3) 2 -> 3 (+1) 0 -> 0 (0) pass
default / terminal-side-panel-open 48 -> 40 (-8) 48 -> 40 (-8) 0 -> 0 (0) 0 -> 0 (0) 16.8 -> 33.3 (+16.5) 33.2 -> 33.3 (+0.1) 0 -> 0 (0) 0 -> 0 (0) pass
default / session-scroll-reading 24 -> 24 (0) 24 -> 24 (0) 0 -> 0 (0) 0 -> 0 (0) 16.7 -> 16.7 (0) 16.7 -> 16.7 (0) 0 -> 0 (0) 0.505 -> 0.505 (0) warn: cls
low-end / session-scroll-reading-long 56 -> 48 (-8) 64 -> 64 (0) 84 -> 79 (-5) 61 -> 53 (-8) 16.8 -> 16.8 (0) 150 -> 83.3 (-66.7) 4 -> 3 (-1) 0.016 -> 0.016 (0) pass
low-end / session-timeline-recompute 112 -> 120 (+8) 120 -> 120 (0) 100 -> 109 (+9) 155 -> 198 (+43) 83.4 -> 99.9 (+16.5) 166.7 -> 166.5 (-0.2) 3 -> 3 (0) 0.081 -> 0.081 (0) pass
low-end / concurrent-shimmer-extreme 0 -> 0 (0) 0 -> 0 (0) 0 -> 0 (0) 0 -> 0 (0) 16.8 -> 16.8 (0) 33.3 -> 16.8 (-16.5) 0 -> 0 (0) 0 -> 0 (0) pass

@Astro-Han
Copy link
Copy Markdown
Owner Author

Review response on the heterogeneous message fixture point:

I am intentionally keeping that out of this PR.

The concern is technically valid: a future guard should cover real assistant/tool/reasoning/image message parts, not only long user text turns. But adding those fixtures here would change the goal of PR1 from "establish a stable long-scroll measurement baseline" to "cover every expensive message-part renderer." That makes attribution worse for the current scroll investigation because tool/reasoning/image rendering cost can introduce separate bottlenecks from the scroll controller and timeline movement path.

This PR should stay focused on the current scroll bottleneck:

  • long-session scroll distance and movement coverage
  • expanded todo dock layout pressure
  • 15s+ low-end perf sampling
  • base/head comparator stability

I am treating heterogeneous assistant/tool/reasoning/image coverage as a follow-up scenario, likely session-scroll-reading-heterogeneous, after this baseline is merged. That keeps this PR reviewable and keeps later product performance PRs from mixing scroll regressions with unrelated message-part rendering costs.

All current checks are green, including perf-probe-baseline. Please re-review the remaining diff.

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

@Astro-Han The scoping rationale is well-reasoned. Separating scroll controller / timeline movement cost from message-part rendering cost keeps attribution clean for the current investigation, and staging session-scroll-reading-heterogeneous as a follow-up after this baseline is merged is the right sequencing. I'll re-review the remaining diff now.


✏️ Learnings added
Learnt from: Astro-Han
URL: https://github.com/Astro-Han/pawwork/pull/635

Timestamp: 2026-05-15T10:47:34.985Z
Learning: In the Pawwork repository, the perf scenario `session-scroll-reading-long` (PR `#635`) intentionally uses only long user-text turns and excludes heterogeneous assistant/tool/reasoning/image message parts. The exclusion is deliberate: mixing message-part rendering cost into the baseline would hurt attribution for the scroll controller / timeline movement investigation. A follow-up scenario named `session-scroll-reading-heterogeneous` is planned after the baseline is merged.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

✅ Actions performed

Full review triggered.

@Astro-Han Astro-Han merged commit 84bb3ab into dev May 15, 2026
27 checks passed
@Astro-Han Astro-Han deleted the codex/i633-scroll-perf-guard branch May 15, 2026 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app Application behavior and product flows ci Continuous integration / GitHub Actions P1 High priority task Maintainer or agent execution task

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant