Skip to content

[Task] chore(e2e): strengthen W1 regression coverage #597

@Astro-Han

Description

@Astro-Han

Use this for maintainer or agent execution work that is not a user-facing bug report or feature request.

Keep the issue concrete enough that an agent can understand the target, scope, and verification path.

Goal

Strengthen W1 regression coverage so that more of the current manual handtest becomes enforced by CI, and the remaining human pass is focused on true design judgment instead of repeatable renderer checks.

When this task is done, the missing W1 regression coverage is documented and implemented in the correct layers: behavior assertions, computed-style assertions, checklist segmentation, and correct failure-command guidance.

Scope

In scope:

  • Add E2E coverage for the remaining W1 behavior assertions that are still being checked manually.
  • Add lower-cost computed-style assertions where screenshot diff is unnecessary.
  • Split the local W1 checklist into “CI must pass first” vs “5-minute visual pass”.
  • Document correct stderr-producing commands for tool-failure tests.

Detailed scope:

A. Behavior assertions

Add E2E specs for scenarios currently relied on by manual handtest:

  • scroll: reading_history persistence across content_resize / dock_resize
  • scroll: weak trackpad gesture (multiple low-delta wheels) still demotes to reading_history
  • scroll: nested raw tool output gesture isolation from parent timeline
  • chevron: 12px size + collapsed-right / expanded-down orientation
  • thinking indicator: only renders when working && assistantVisible === 0; disappears once any reasoning / prose / tool appears
  • bubble selectability: user-select: text on user-message-text / bubble-text / agent-prose / agent-reasoning subtrees

B. Computed style assertions

Add computed-style assertions for:

  • user bubble hairline: dark --border-weak, light --border-weaker inset box-shadow
  • trow-result-body inner descendants: mono-small font + fg-weak color, with no sans/base/large leakage
  • collapsible chevron icon size: 12px (DESIGN.md L412)

C. Handtest checklist segmentation

Update docs/design/session-view-w1-manual-checklist.md to split into two sections:

  • CI must pass first — items already covered by smoke gate
  • 5-minute visual pass — items that still require human design judgment

D. Tool failure test command guidance

Document that false is not a valid stderr-display test command because it produces no output by design. Recommend:

  • ls /definitely-not-exist
  • node -e 'console.error("boom"); process.exit(1)'

Out of scope:

  • Electron E2E
  • visual-diff infrastructure (Percy / Chromatic / Playwright snapshots)
  • Tool Display Spec Registry (tracked separately)

Relevant files or context

Related context:

Likely files:

  • packages/app/e2e/session/session-w1-*.spec.ts
  • local checklist: docs/design/session-view-w1-manual-checklist.md
  • supporting timeline / session helpers if assertions require stable hooks

References:

Verification

  • New A/B specs pass within the current smoke runtime budget.
  • Checklist is split into CI-covered vs human-visual sections.
  • Failure-command guidance is updated in the checklist.
  • All new specs pass on a clean main without regressing existing smoke coverage.

Execution mode

Agent should investigate and propose a plan first

Metadata

Metadata

Assignees

No one assigned

    Labels

    appApplication behavior and product flowsenhancementNew feature or requesttaskMaintainer or agent execution task

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions