[Task] chore(e2e): strengthen W1 regression coverage

Use this for maintainer or agent execution work that is not a user-facing bug report or feature request.

Keep the issue concrete enough that an agent can understand the target, scope, and verification path.

## Goal

Strengthen W1 regression coverage so that more of the current manual handtest becomes enforced by CI, and the remaining human pass is focused on true design judgment instead of repeatable renderer checks.

When this task is done, the missing W1 regression coverage is documented and implemented in the correct layers: behavior assertions, computed-style assertions, checklist segmentation, and correct failure-command guidance.

## Scope

In scope:
- Add E2E coverage for the remaining W1 behavior assertions that are still being checked manually.
- Add lower-cost computed-style assertions where screenshot diff is unnecessary.
- Split the local W1 checklist into “CI must pass first” vs “5-minute visual pass”.
- Document correct stderr-producing commands for tool-failure tests.

Detailed scope:

### A. Behavior assertions

Add E2E specs for scenarios currently relied on by manual handtest:
- scroll: `reading_history` persistence across `content_resize` / `dock_resize`
- scroll: weak trackpad gesture (multiple low-delta wheels) still demotes to `reading_history`
- scroll: nested raw tool output gesture isolation from parent timeline
- chevron: 12px size + collapsed-right / expanded-down orientation
- thinking indicator: only renders when `working && assistantVisible === 0`; disappears once any reasoning / prose / tool appears
- bubble selectability: `user-select: text` on user-message-text / bubble-text / agent-prose / agent-reasoning subtrees

### B. Computed style assertions

Add computed-style assertions for:
- user bubble hairline: dark `--border-weak`, light `--border-weaker` inset box-shadow
- trow-result-body inner descendants: mono-small font + fg-weak color, with no sans/base/large leakage
- collapsible chevron icon size: 12px (DESIGN.md L412)

### C. Handtest checklist segmentation

Update `docs/design/session-view-w1-manual-checklist.md` to split into two sections:
- `CI must pass first` — items already covered by smoke gate
- `5-minute visual pass` — items that still require human design judgment

### D. Tool failure test command guidance

Document that `false` is not a valid stderr-display test command because it produces no output by design. Recommend:
- `ls /definitely-not-exist`
- `node -e 'console.error("boom"); process.exit(1)'`

Out of scope:
- Electron E2E
- visual-diff infrastructure (Percy / Chromatic / Playwright snapshots)
- Tool Display Spec Registry (tracked separately)

## Relevant files or context

Related context:
- task #71 research concluded that W1 manual handtest is too expensive because CI does not yet gate enough renderer and interaction behavior.
- PR #589 already delivers smoke-gate inclusion as the first step; this issue tracks the remaining work.
- Tool summary follow-up is tracked separately in #596.

Likely files:
- `packages/app/e2e/session/session-w1-*.spec.ts`
- local checklist: `docs/design/session-view-w1-manual-checklist.md`
- supporting timeline / session helpers if assertions require stable hooks

References:
- PR #589 (slice 11b.1)
- #596 — agentic trow summary follow-up

## Verification

- New A/B specs pass within the current smoke runtime budget.
- Checklist is split into CI-covered vs human-visual sections.
- Failure-command guidance is updated in the checklist.
- All new specs pass on a clean main without regressing existing smoke coverage.

## Execution mode

Agent should investigate and propose a plan first


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task] chore(e2e): strengthen W1 regression coverage #597

Goal

Scope

A. Behavior assertions

B. Computed style assertions

C. Handtest checklist segmentation

D. Tool failure test command guidance

Relevant files or context

Verification

Execution mode

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Task] chore(e2e): strengthen W1 regression coverage #597

Description

Goal

Scope

A. Behavior assertions

B. Computed style assertions

C. Handtest checklist segmentation

D. Tool failure test command guidance

Relevant files or context

Verification

Execution mode

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions