Use this for maintainer or agent execution work that is not a user-facing bug report or feature request.
Keep the issue concrete enough that an agent can understand the target, scope, and verification path.
Goal
Strengthen W1 regression coverage so that more of the current manual handtest becomes enforced by CI, and the remaining human pass is focused on true design judgment instead of repeatable renderer checks.
When this task is done, the missing W1 regression coverage is documented and implemented in the correct layers: behavior assertions, computed-style assertions, checklist segmentation, and correct failure-command guidance.
Scope
In scope:
- Add E2E coverage for the remaining W1 behavior assertions that are still being checked manually.
- Add lower-cost computed-style assertions where screenshot diff is unnecessary.
- Split the local W1 checklist into “CI must pass first” vs “5-minute visual pass”.
- Document correct stderr-producing commands for tool-failure tests.
Detailed scope:
A. Behavior assertions
Add E2E specs for scenarios currently relied on by manual handtest:
- scroll:
reading_history persistence across content_resize / dock_resize
- scroll: weak trackpad gesture (multiple low-delta wheels) still demotes to
reading_history
- scroll: nested raw tool output gesture isolation from parent timeline
- chevron: 12px size + collapsed-right / expanded-down orientation
- thinking indicator: only renders when
working && assistantVisible === 0; disappears once any reasoning / prose / tool appears
- bubble selectability:
user-select: text on user-message-text / bubble-text / agent-prose / agent-reasoning subtrees
B. Computed style assertions
Add computed-style assertions for:
- user bubble hairline: dark
--border-weak, light --border-weaker inset box-shadow
- trow-result-body inner descendants: mono-small font + fg-weak color, with no sans/base/large leakage
- collapsible chevron icon size: 12px (DESIGN.md L412)
C. Handtest checklist segmentation
Update docs/design/session-view-w1-manual-checklist.md to split into two sections:
CI must pass first — items already covered by smoke gate
5-minute visual pass — items that still require human design judgment
D. Tool failure test command guidance
Document that false is not a valid stderr-display test command because it produces no output by design. Recommend:
ls /definitely-not-exist
node -e 'console.error("boom"); process.exit(1)'
Out of scope:
- Electron E2E
- visual-diff infrastructure (Percy / Chromatic / Playwright snapshots)
- Tool Display Spec Registry (tracked separately)
Relevant files or context
Related context:
Likely files:
packages/app/e2e/session/session-w1-*.spec.ts
- local checklist:
docs/design/session-view-w1-manual-checklist.md
- supporting timeline / session helpers if assertions require stable hooks
References:
Verification
- New A/B specs pass within the current smoke runtime budget.
- Checklist is split into CI-covered vs human-visual sections.
- Failure-command guidance is updated in the checklist.
- All new specs pass on a clean main without regressing existing smoke coverage.
Execution mode
Agent should investigate and propose a plan first
Use this for maintainer or agent execution work that is not a user-facing bug report or feature request.
Keep the issue concrete enough that an agent can understand the target, scope, and verification path.
Goal
Strengthen W1 regression coverage so that more of the current manual handtest becomes enforced by CI, and the remaining human pass is focused on true design judgment instead of repeatable renderer checks.
When this task is done, the missing W1 regression coverage is documented and implemented in the correct layers: behavior assertions, computed-style assertions, checklist segmentation, and correct failure-command guidance.
Scope
In scope:
Detailed scope:
A. Behavior assertions
Add E2E specs for scenarios currently relied on by manual handtest:
reading_historypersistence acrosscontent_resize/dock_resizereading_historyworking && assistantVisible === 0; disappears once any reasoning / prose / tool appearsuser-select: texton user-message-text / bubble-text / agent-prose / agent-reasoning subtreesB. Computed style assertions
Add computed-style assertions for:
--border-weak, light--border-weakerinset box-shadowC. Handtest checklist segmentation
Update
docs/design/session-view-w1-manual-checklist.mdto split into two sections:CI must pass first— items already covered by smoke gate5-minute visual pass— items that still require human design judgmentD. Tool failure test command guidance
Document that
falseis not a valid stderr-display test command because it produces no output by design. Recommend:ls /definitely-not-existnode -e 'console.error("boom"); process.exit(1)'Out of scope:
Relevant files or context
Related context:
Likely files:
packages/app/e2e/session/session-w1-*.spec.tsdocs/design/session-view-w1-manual-checklist.mdReferences:
Verification
Execution mode
Agent should investigate and propose a plan first