steipete · HeMuling · Mar 16, 2026
diff --git a/docs/browser-mode.md b/docs/browser-mode.md
@@ -49,7 +49,7 @@ You can pass the same payload inline (`--browser-inline-cookies '<json or base64
    - (Optional) copies cookies from the requested browser profile via Oracle’s built-in cookie reader (Keychain/DPAPI aware) so you stay signed in.
    - Navigates to `chatgpt.com`, switches the model to the requested GPT-5.4 / GPT-5.2 variant, pastes the prompt, waits for completion, and copies the markdown via the built-in “copy turn” button.
    - Immediately probes `/backend-api/me` in the ChatGPT tab to verify the session is authenticated; if the endpoint returns 401/403 we abort early with a login-specific error instead of timing out waiting for the composer.
-   - When `--file` inputs would push the pasted composer content over ~60k characters, we switch to uploading attachments (optionally bundled) and wait for ChatGPT to re-enable the send button before submitting the combined system+user prompt.
+  - When `--file` inputs would push the pasted composer content over ~60k characters, we switch to uploading attachments (optionally bundled), first wait for attachment evidence to stabilize in the composer, then wait for the composer-specific send button to become clickable before submitting the combined system+user prompt. Attachment submissions never fall back to plain Enter if ChatGPT never reaches a send-ready state.
    - Cleans up the temporary profile unless `--browser-keep-browser` is passed.
 3. **Session integration** – browser sessions use the normal log writer, add `mode: "browser"` plus `browser.config/runtime` metadata, and log the Chrome PID/port so `oracle session <id>` (or `oracle status <id>`) shows a marker for the background Chrome process.
 4. **Usage accounting** – we estimate input tokens with the same tokenizer used for API runs and estimate output tokens via `estimateTokenCount`. `oracle status` therefore shows comparable cost/timing info even though the call ran through the browser.
@@ -228,7 +228,7 @@ This mode is ideal when you have a macOS VM (or spare Mac mini) logged into Chat
 
 ## Limitations / Follow-Up Plan
 
-- **Attachment lifecycle** – in `auto` mode we prefer inlining files into the composer (fewer moving parts). When we do upload, each `--file` path is uploaded separately (or bundled) so ChatGPT can ingest filenames/content. The automation waits for uploads to finish (send button enabled, upload chips visible) before submitting. When inline paste is rejected by ChatGPT (too large), Oracle retries automatically with uploads.
+- **Attachment lifecycle** – in `auto` mode we prefer inlining files into the composer (fewer moving parts). When we do upload, each `--file` path is uploaded separately (or bundled) so ChatGPT can ingest filenames/content. The automation treats upload completion and send readiness as separate gates: it first waits for stable attachment evidence, then after the prompt is in the composer it waits for the same composer’s send button to become clickable before clicking it. If attachment evidence never stabilizes, Oracle fails the run instead of degrading into a plain-text Enter submit. When inline paste is rejected by ChatGPT (too large), Oracle retries automatically with uploads.
 - **Model picker drift** – we rely on heuristics to pick GPT-5.4 / GPT-5.2 variants. If OpenAI changes the DOM we need to refresh the selectors quickly. Consider snapshot tests or a small “self check” command.
 - **Non-mac platforms** – window hiding uses AppleScript today; Linux/Windows just ignore the flag. We should detect platforms explicitly and document the behavior.
 - **Streaming UX** – browser runs cannot stream tokens, so we log a warning before launching Chrome. Investigate whether we can stream clipboard deltas via mutation observers for a closer UX.

diff --git a/docs/manual-tests.md b/docs/manual-tests.md
@@ -158,6 +158,18 @@ Run these four smoke tests whenever we touch browser automation:
    Prepare `/tmp/browser-report.txt` with faux metrics, then run  
    `pnpm run oracle -- --engine browser --model gpt-5.2 --prompt "Use the attachment to report current CPU and memory figures" --file /tmp/browser-report.txt --verbose`  
    Verify verbose logs show attachment upload and the final answer matches the file data.
+   Expected attachment-send logs:
+   - `Attachment queued`
+   - `All attachments uploaded`
+   - `Clicked send button`
+   - no `Submitted prompt via Enter key` after the attachment upload stage
+
+5. **Attachment send race guard**
+   Prepare a small text file, then run
+   `pnpm run oracle -- --engine browser --model gpt-5.2 --prompt "Reply exactly with OK." --file /tmp/browser-report.txt --verbose`
+   Validate one of these outcomes:
+   - success path: `All attachments uploaded` followed by `Clicked send button`, then the assistant answer
+   - fail-fast path: an explicit attachment/browser automation error before send, with no Enter fallback
 
 Record session IDs and outcomes in the PR description (pass/fail, notable delays). This ensures reviewers can audit real runs.