You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: Codex hang fixes — plan visibility, stdout buffering, reasoning effort (v0.12.4.0) (garrytan#536)
* fix: unbuffer Python stdout in codex --json streaming
Python fully buffers stdout when piped (not a TTY). The
`codex exec --json | python3 -c "..."` pattern meant zero output
visible until process exit — users saw nothing for 30+ minutes.
Add PYTHONUNBUFFERED=1 env var, python3 -u flag, and flush=True
to all print() calls in all three Python parser blocks (Challenge,
Consult new session, Consult resumed session).
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
* fix: per-mode reasoning effort defaults, add --xhigh override
xhigh reasoning uses ~23x more tokens and causes 50+ minute hangs
on large context tasks (OpenAI issues #8545, #8402, #6931).
Per-mode defaults for /codex skill:
- Review: high (bounded diff, needs thoroughness)
- Challenge: high (adversarial but bounded by diff)
- Consult: medium (large context, interactive, needs speed)
Also changes all Outside Voice / adversarial codex invocations
across gstack (resolvers, gen-skill-docs) from xhigh to high.
Users can override with --xhigh flag when they want max reasoning.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
* fix: explicit plan content embedding for codex sandbox visibility
Codex runs sandboxed to repo root (-C) and cannot access
~/.claude/plans/. The template already instructed content embedding
but wasn't explicit enough — Claude sometimes shortcut to
referencing the file path, causing Codex to waste 10+ tool calls
searching before giving up.
Strengthen the instruction to make embedding unambiguous: "embed
FULL CONTENT, do NOT reference the file path." Also extract
referenced source file paths from the plan so Codex reads them
directly instead of discovering via rg/find.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
* fix: add --xhigh reminder to challenge and consult modes
The --xhigh override was only documented in Step 2A (review).
Steps 2B (challenge) and 2C (consult) lacked the reminder,
so the flag would silently do nothing for those modes.
Found by adversarial review.
* chore: bump version and changelog (v0.12.4.0)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Three bugs in `/codex` caused 30+ minute hangs with zero output during plan reviews and adversarial checks. All three are fixed.
6
+
7
+
### Fixed
8
+
9
+
-**Plan files now visible to Codex sandbox.** Codex runs sandboxed to the repo root and couldn't see plan files at `~/.claude/plans/`. It would waste 10+ tool calls searching before giving up. Now the plan content is embedded directly in the prompt, and referenced source files are listed so Codex reads them immediately.
10
+
-**Streaming output actually streams.** Python's stdout buffering meant zero output visible until the process exited. Added `PYTHONUNBUFFERED=1`, `python3 -u`, and `flush=True` on every print call across all three Codex modes.
11
+
-**Sane reasoning effort defaults.** Replaced hardcoded `xhigh` (23x more tokens, known 50+ min hangs per OpenAI issues #8545, #8402, #6931) with per-mode defaults: `high` for review and challenge, `medium` for consult. Users can override with `--xhigh` flag when they want maximum reasoning.
12
+
-**`--xhigh` override works in all modes.** The override reminder was missing from challenge and consult mode instructions. Found by adversarial review.
13
+
3
14
## [0.12.4.0] - 2026-03-26 — Full Commit Coverage in /ship
4
15
5
16
When you ship a branch with 12 commits spanning performance work, dead code removal, and test infra, the PR should mention all three. It wasn't. The CHANGELOG and PR summary biased toward whatever happened most recently, silently dropping earlier work.
3. Capture the output. Then parse cost from stderr:
@@ -563,8 +573,11 @@ With focus (e.g., "security"):
563
573
"Review the changes on this branch against the base branch. Run `git diff origin/<base>` to see the diff. Focus specifically on SECURITY. Your job is to find every way an attacker could exploit this code. Think about injection vectors, auth bypasses, privilege escalation, data exposure, and timing attacks. Be adversarial."
564
574
565
575
2. Run codex exec with **JSONL output** to capture reasoning traces and tool calls (5-minute timeout):
576
+
577
+
If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`.
<same python streaming parser as above, with flush=True on all print() calls>
687
714
"
688
715
```
689
716
@@ -718,7 +745,14 @@ Session saved — run /codex again to continue this conversation.
718
745
agentic coding model). This means as OpenAI ships newer models, /codex automatically
719
746
uses them. If the user wants a specific model, pass `-m` through to codex.
720
747
721
-
**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
748
+
**Reasoning effort (per-mode defaults):**
749
+
-**Review (2A):**`high` — bounded diff input, needs thoroughness but not max tokens
750
+
-**Challenge (2B):**`high` — adversarial but bounded by diff size
751
+
-**Consult (2C):**`medium` — large context (plans, codebase), interactive, needs speed
752
+
753
+
`xhigh` uses ~23x more tokens than `high` and causes 50+ minute hangs on large context
754
+
tasks (OpenAI issues #8545, #8402, #6931). Users can override with `--xhigh` flag
755
+
(e.g., `/codex review --xhigh`) when they want maximum reasoning and are willing to wait.
722
756
723
757
**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
724
758
docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.
3. Capture the output. Then parse cost from stderr:
@@ -158,8 +168,11 @@ With focus (e.g., "security"):
158
168
"Review the changes on this branch against the base branch. Run `git diff origin/<base>` to see the diff. Focus specifically on SECURITY. Your job is to find every way an attacker could exploit this code. Think about injection vectors, auth bypasses, privilege escalation, data exposure, and timing attacks. Be adversarial."
159
169
160
170
2. Run codex exec with **JSONL output** to capture reasoning traces and tool calls (5-minute timeout):
171
+
172
+
If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`.
<same python streaming parser as above, with flush=True on all print() calls>
282
309
"
283
310
```
284
311
@@ -313,7 +340,14 @@ Session saved — run /codex again to continue this conversation.
313
340
agentic coding model). This means as OpenAI ships newer models, /codex automatically
314
341
uses them. If the user wants a specific model, pass `-m` through to codex.
315
342
316
-
**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
343
+
**Reasoning effort (per-mode defaults):**
344
+
- **Review (2A):** `high` — bounded diff input, needs thoroughness but not max tokens
345
+
- **Challenge (2B):** `high` — adversarial but bounded by diff size
0 commit comments