Skip to content

fix: Codex hang fixes — plan visibility, stdout buffering, reasoning effort (v0.12.4.0)#536

Merged
garrytan merged 7 commits intomainfrom
garrytan/codex-hang-fixes
Mar 27, 2026
Merged

fix: Codex hang fixes — plan visibility, stdout buffering, reasoning effort (v0.12.4.0)#536
garrytan merged 7 commits intomainfrom
garrytan/codex-hang-fixes

Conversation

@garrytan
Copy link
Copy Markdown
Owner

Summary

  • Plan files now visible to Codex sandbox. Codex runs sandboxed to the repo root and couldn't see plan files at ~/.claude/plans/. Now the plan content is embedded directly in the prompt, and referenced source files are listed so Codex reads them immediately.
  • Streaming output actually streams. Python's stdout buffering meant zero output visible until the process exited. Added PYTHONUNBUFFERED=1, python3 -u, and flush=True on every print call across all three Codex modes.
  • Sane reasoning effort defaults. Replaced hardcoded xhigh (23x more tokens, known 50+ min hangs per OpenAI issues #8545, #8402, #6931) with per-mode defaults: high for review/challenge, medium for consult. Users can override with --xhigh.
  • --xhigh override works in all modes. The override reminder was missing from challenge and consult mode instructions. Found by adversarial review.

Pre-Landing Review

No issues found.

Adversarial Review

Claude adversarial subagent (medium tier, 172 lines). 10 findings — 1 fixed (--xhigh reminder gap), 4 pre-existing, 5 intentional/acceptable.

Test plan

  • All tests pass (exit 0)
  • Skill validation passes
  • Gen-skill-docs quality checks pass
  • VERSION matches package.json

🤖 Generated with Claude Code

garrytan and others added 7 commits March 26, 2026 12:17
Python fully buffers stdout when piped (not a TTY). The
`codex exec --json | python3 -c "..."` pattern meant zero output
visible until process exit — users saw nothing for 30+ minutes.

Add PYTHONUNBUFFERED=1 env var, python3 -u flag, and flush=True
to all print() calls in all three Python parser blocks (Challenge,
Consult new session, Consult resumed session).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
xhigh reasoning uses ~23x more tokens and causes 50+ minute hangs
on large context tasks (OpenAI issues #8545, #8402, #6931).

Per-mode defaults for /codex skill:
- Review: high (bounded diff, needs thoroughness)
- Challenge: high (adversarial but bounded by diff)
- Consult: medium (large context, interactive, needs speed)

Also changes all Outside Voice / adversarial codex invocations
across gstack (resolvers, gen-skill-docs) from xhigh to high.
Users can override with --xhigh flag when they want max reasoning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex runs sandboxed to repo root (-C) and cannot access
~/.claude/plans/. The template already instructed content embedding
but wasn't explicit enough — Claude sometimes shortcut to
referencing the file path, causing Codex to waste 10+ tool calls
searching before giving up.

Strengthen the instruction to make embedding unambiguous: "embed
FULL CONTENT, do NOT reference the file path." Also extract
referenced source file paths from the plan so Codex reads them
directly instead of discovering via rg/find.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The --xhigh override was only documented in Step 2A (review).
Steps 2B (challenge) and 2C (consult) lacked the reminder,
so the flag would silently do nothing for those modes.
Found by adversarial review.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Main shipped v0.12.4.0 (full commit coverage in /ship) while this
branch also used v0.12.4.0. Resolved by keeping both CHANGELOG
entries and bumping this branch to v0.12.5.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

E2E Evals: ✅ PASS

36/36 tests passed | $4.27 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 5/5 $0.2
e2e-deploy 3/3 $0.61
e2e-design 3/3 $0.55
e2e-plan 7/7 $1.09
e2e-qa-workflow 1/1 $0.09
e2e-review 5/5 $0.74
e2e-workflow 2/2 $0.15
llm-judge 5/5 $0.1
e2e-review 5/5 $0.74

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

@garrytan garrytan merged commit 1b60acd into main Mar 27, 2026
18 checks passed
KimYoungHwan8750 pushed a commit to KimYoungHwan8750/gstack-ko that referenced this pull request Mar 27, 2026
…effort (v0.12.4.0) (garrytan#536)

* fix: unbuffer Python stdout in codex --json streaming

Python fully buffers stdout when piped (not a TTY). The
`codex exec --json | python3 -c "..."` pattern meant zero output
visible until process exit — users saw nothing for 30+ minutes.

Add PYTHONUNBUFFERED=1 env var, python3 -u flag, and flush=True
to all print() calls in all three Python parser blocks (Challenge,
Consult new session, Consult resumed session).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: per-mode reasoning effort defaults, add --xhigh override

xhigh reasoning uses ~23x more tokens and causes 50+ minute hangs
on large context tasks (OpenAI issues #8545, #8402, #6931).

Per-mode defaults for /codex skill:
- Review: high (bounded diff, needs thoroughness)
- Challenge: high (adversarial but bounded by diff)
- Consult: medium (large context, interactive, needs speed)

Also changes all Outside Voice / adversarial codex invocations
across gstack (resolvers, gen-skill-docs) from xhigh to high.
Users can override with --xhigh flag when they want max reasoning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: explicit plan content embedding for codex sandbox visibility

Codex runs sandboxed to repo root (-C) and cannot access
~/.claude/plans/. The template already instructed content embedding
but wasn't explicit enough — Claude sometimes shortcut to
referencing the file path, causing Codex to waste 10+ tool calls
searching before giving up.

Strengthen the instruction to make embedding unambiguous: "embed
FULL CONTENT, do NOT reference the file path." Also extract
referenced source file paths from the plan so Codex reads them
directly instead of discovering via rg/find.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add --xhigh reminder to challenge and consult modes

The --xhigh override was only documented in Step 2A (review).
Steps 2B (challenge) and 2C (consult) lacked the reminder,
so the flag would silently do nothing for those modes.
Found by adversarial review.

* chore: bump version and changelog (v0.12.4.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rapidstartup pushed a commit to rapidstartup/gstack that referenced this pull request Mar 29, 2026
…effort (v0.12.4.0) (garrytan#536)

* fix: unbuffer Python stdout in codex --json streaming

Python fully buffers stdout when piped (not a TTY). The
`codex exec --json | python3 -c "..."` pattern meant zero output
visible until process exit — users saw nothing for 30+ minutes.

Add PYTHONUNBUFFERED=1 env var, python3 -u flag, and flush=True
to all print() calls in all three Python parser blocks (Challenge,
Consult new session, Consult resumed session).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: per-mode reasoning effort defaults, add --xhigh override

xhigh reasoning uses ~23x more tokens and causes 50+ minute hangs
on large context tasks (OpenAI issues #8545, #8402, #6931).

Per-mode defaults for /codex skill:
- Review: high (bounded diff, needs thoroughness)
- Challenge: high (adversarial but bounded by diff)
- Consult: medium (large context, interactive, needs speed)

Also changes all Outside Voice / adversarial codex invocations
across gstack (resolvers, gen-skill-docs) from xhigh to high.
Users can override with --xhigh flag when they want max reasoning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: explicit plan content embedding for codex sandbox visibility

Codex runs sandboxed to repo root (-C) and cannot access
~/.claude/plans/. The template already instructed content embedding
but wasn't explicit enough — Claude sometimes shortcut to
referencing the file path, causing Codex to waste 10+ tool calls
searching before giving up.

Strengthen the instruction to make embedding unambiguous: "embed
FULL CONTENT, do NOT reference the file path." Also extract
referenced source file paths from the plan so Codex reads them
directly instead of discovering via rg/find.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add --xhigh reminder to challenge and consult modes

The --xhigh override was only documented in Step 2A (review).
Steps 2B (challenge) and 2C (consult) lacked the reminder,
so the flag would silently do nothing for those modes.
Found by adversarial review.

* chore: bump version and changelog (v0.12.4.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant