fix: Codex hang fixes — plan visibility, stdout buffering, reasoning effort (v0.12.4.0) by garrytan · Pull Request #536 · garrytan/gstack

garrytan · 2026-03-27T00:02:06Z

Summary

Plan files now visible to Codex sandbox. Codex runs sandboxed to the repo root and couldn't see plan files at ~/.claude/plans/. Now the plan content is embedded directly in the prompt, and referenced source files are listed so Codex reads them immediately.
Streaming output actually streams. Python's stdout buffering meant zero output visible until the process exited. Added PYTHONUNBUFFERED=1, python3 -u, and flush=True on every print call across all three Codex modes.
Sane reasoning effort defaults. Replaced hardcoded xhigh (23x more tokens, known 50+ min hangs per OpenAI issues #8545, #8402, #6931) with per-mode defaults: high for review/challenge, medium for consult. Users can override with --xhigh.
--xhigh override works in all modes. The override reminder was missing from challenge and consult mode instructions. Found by adversarial review.

Pre-Landing Review

No issues found.

Adversarial Review

Claude adversarial subagent (medium tier, 172 lines). 10 findings — 1 fixed (--xhigh reminder gap), 4 pre-existing, 5 intentional/acceptable.

Test plan

All tests pass (exit 0)
Skill validation passes
Gen-skill-docs quality checks pass
VERSION matches package.json

🤖 Generated with Claude Code

Python fully buffers stdout when piped (not a TTY). The `codex exec --json | python3 -c "..."` pattern meant zero output visible until process exit — users saw nothing for 30+ minutes. Add PYTHONUNBUFFERED=1 env var, python3 -u flag, and flush=True to all print() calls in all three Python parser blocks (Challenge, Consult new session, Consult resumed session). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

xhigh reasoning uses ~23x more tokens and causes 50+ minute hangs on large context tasks (OpenAI issues #8545, #8402, #6931). Per-mode defaults for /codex skill: - Review: high (bounded diff, needs thoroughness) - Challenge: high (adversarial but bounded by diff) - Consult: medium (large context, interactive, needs speed) Also changes all Outside Voice / adversarial codex invocations across gstack (resolvers, gen-skill-docs) from xhigh to high. Users can override with --xhigh flag when they want max reasoning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Codex runs sandboxed to repo root (-C) and cannot access ~/.claude/plans/. The template already instructed content embedding but wasn't explicit enough — Claude sometimes shortcut to referencing the file path, causing Codex to waste 10+ tool calls searching before giving up. Strengthen the instruction to make embedding unambiguous: "embed FULL CONTENT, do NOT reference the file path." Also extract referenced source file paths from the plan so Codex reads them directly instead of discovering via rg/find. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ixes

The --xhigh override was only documented in Step 2A (review). Steps 2B (challenge) and 2C (consult) lacked the reminder, so the flag would silently do nothing for those modes. Found by adversarial review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Main shipped v0.12.4.0 (full commit coverage in /ship) while this branch also used v0.12.4.0. Resolved by keeping both CHANGELOG entries and bumping this branch to v0.12.5.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-27T00:11:42Z

E2E Evals: ✅ PASS

36/36 tests passed | $4.27 total cost | 12 parallel runners

Suite	Result	Status	Cost
e2e-browse	5/5	✅	$0.2
e2e-deploy	3/3	✅	$0.61
e2e-design	3/3	✅	$0.55
e2e-plan	7/7	✅	$1.09
e2e-qa-workflow	1/1	✅	$0.09
e2e-review	5/5	✅	$0.74
e2e-workflow	2/2	✅	$0.15
llm-judge	5/5	✅	$0.1
e2e-review	5/5	✅	$0.74

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

…effort (v0.12.4.0) (garrytan#536) * fix: unbuffer Python stdout in codex --json streaming Python fully buffers stdout when piped (not a TTY). The `codex exec --json | python3 -c "..."` pattern meant zero output visible until process exit — users saw nothing for 30+ minutes. Add PYTHONUNBUFFERED=1 env var, python3 -u flag, and flush=True to all print() calls in all three Python parser blocks (Challenge, Consult new session, Consult resumed session). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: per-mode reasoning effort defaults, add --xhigh override xhigh reasoning uses ~23x more tokens and causes 50+ minute hangs on large context tasks (OpenAI issues #8545, #8402, #6931). Per-mode defaults for /codex skill: - Review: high (bounded diff, needs thoroughness) - Challenge: high (adversarial but bounded by diff) - Consult: medium (large context, interactive, needs speed) Also changes all Outside Voice / adversarial codex invocations across gstack (resolvers, gen-skill-docs) from xhigh to high. Users can override with --xhigh flag when they want max reasoning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: explicit plan content embedding for codex sandbox visibility Codex runs sandboxed to repo root (-C) and cannot access ~/.claude/plans/. The template already instructed content embedding but wasn't explicit enough — Claude sometimes shortcut to referencing the file path, causing Codex to waste 10+ tool calls searching before giving up. Strengthen the instruction to make embedding unambiguous: "embed FULL CONTENT, do NOT reference the file path." Also extract referenced source file paths from the plan so Codex reads them directly instead of discovering via rg/find. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add --xhigh reminder to challenge and consult modes The --xhigh override was only documented in Step 2A (review). Steps 2B (challenge) and 2C (consult) lacked the reminder, so the flag would silently do nothing for those modes. Found by adversarial review. * chore: bump version and changelog (v0.12.4.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

garrytan and others added 7 commits March 26, 2026 12:17

Merge remote-tracking branch 'origin/main' into garrytan/codex-hang-f…

6c54028

…ixes

fix: add --xhigh reminder to challenge and consult modes

b2fa051

The --xhigh override was only documented in Step 2A (review). Steps 2B (challenge) and 2C (consult) lacked the reminder, so the flag would silently do nothing for those modes. Found by adversarial review.

chore: bump version and changelog (v0.12.4.0)

fb70318

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: resolve merge conflict, bump to v0.12.5.0

c8091b5

Main shipped v0.12.4.0 (full commit coverage in /ship) while this branch also used v0.12.4.0. Resolved by keeping both CHANGELOG entries and bumping this branch to v0.12.5.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

garrytan merged commit 1b60acd into main Mar 27, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Codex hang fixes — plan visibility, stdout buffering, reasoning effort (v0.12.4.0)#536

fix: Codex hang fixes — plan visibility, stdout buffering, reasoning effort (v0.12.4.0)#536
garrytan merged 7 commits intomainfrom
garrytan/codex-hang-fixes

garrytan commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented Mar 27, 2026

Summary

Pre-Landing Review

Adversarial Review

Test plan

Uh oh!

github-actions bot commented Mar 27, 2026

E2E Evals: ✅ PASS

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant