Skip to content

feat: add Codex parity, discuss-phase scouting, and agent quality guards#811

Open
Tibsfox wants to merge 3 commits intogsd-build:mainfrom
Tibsfox:feat/agent-discuss-codex-enhancements
Open

feat: add Codex parity, discuss-phase scouting, and agent quality guards#811
Tibsfox wants to merge 3 commits intogsd-build:mainfrom
Tibsfox:feat/agent-discuss-codex-enhancements

Conversation

@Tibsfox
Copy link
Contributor

@Tibsfox Tibsfox commented Feb 28, 2026

Summary

Three feature enhancements that improve Codex runtime parity, discuss-phase intelligence, and agent execution quality:

  • Codex multi-agent config: Full request_user_input mapping in skill adapter, config.toml generation with per-agent .toml files, agent role headers, sandbox mode assignment, and clean uninstall support
  • Code-aware discuss phase: Codebase scouting step before gray area identification, code-context annotations in options, <code_context> section in CONTEXT.md template
  • Agent quality guards: Analysis paralysis guard (gsd-executor), exhaustive PROJECT.md cross-check (gsd-plan-checker), task-level TDD with <behavior> blocks (gsd-planner)

Motivation

Codex parity gap: The Codex runtime adapter had a minimal skill header with no AskUserQuestion mapping and no multi-agent configuration. GSD workflows that rely on interactive questioning (discuss-phase) or agent spawning (execute-phase) could not function on Codex. Agent .md files were converted with basic markdown transforms but lacked the <codex_agent_role> header and per-agent .toml sandbox configs that Codex requires for proper isolation.

Blind discuss-phase: The discuss-phase workflow generated gray areas purely from the ROADMAP.md phase description without examining the actual codebase. This meant it could not suggest reusing existing components, highlight established patterns, or annotate options with code context -- leading to decisions that ignored what was already built.

Agent execution drift: Three recurring failure modes in production:

  1. Executors entering read-loops (5+ consecutive Read/Grep/Glob calls) without writing code, consuming context budget on analysis instead of implementation
  2. Plan-checker verifying only the phase goal's requirements while silently dropping broader PROJECT.md requirements relevant to the phase
  3. Code-producing tasks in standard plans lacking test expectations, causing executors to write implementation-first code without behavioral contracts

Changes

Commit 1: bf26f95 -- Codex request_user_input, multi-agent config, agent role generation

bin/install.js (+298 lines):

  • CODEX_AGENT_SANDBOX map: 11 agents with sandbox modes (9 workspace-write, 2 read-only)
  • getCodexSkillAdapterHeader(): Expanded from 6-line stub to three structured sections:
    • Section A: Skill invocation syntax ($skillName, {{GSD_ARGS}})
    • Section B: AskUserQuestion to request_user_input parameter mapping (header, question, options, multiSelect workaround, Execute mode fallback)
    • Section C: Task() to spawn_agent mapping (agent_type, fork_context, parallel wait pattern, result markers, close_agent)
  • convertClaudeAgentToCodexAgent(): Adds <codex_agent_role> header with role/tools/purpose, cleans frontmatter (drops color/tools fields, quotes name/description)
  • generateCodexAgentToml(): Per-agent .toml with sandbox_mode and developer_instructions from agent body
  • generateCodexConfigBlock(): Generates [features] (multi_agent, default_mode_request_user_input) and [agents] (max_threads=4, max_depth=2) with per-agent sections referencing .toml config files
  • stripGsdFromCodexConfig(): Clean removal of GSD sections during uninstall (handles marker-based, injected keys, and [agents.gsd-*] sections)
  • mergeCodexConfig(): Three-case merge (new file, existing with marker, existing without marker with feature injection)
  • installCodexConfig(): Orchestrates agent discovery, .toml generation, and config merge
  • Test-mode export gate (GSD_TEST_MODE env var) for module-level testing without CLI side effects
  • Uninstall path: removes agent .toml files and cleans config.toml

tests/codex-config.test.cjs (+412 lines, 30 tests across 8 suites):

  • getCodexSkillAdapterHeader: Section presence, invocation syntax, parameter mapping, spawn_agent mapping
  • convertClaudeAgentToCodexAgent: Frontmatter cleanup, slash command conversion, no-frontmatter passthrough
  • generateCodexAgentToml: Sandbox mode assignment (workspace-write, read-only, default), developer_instructions embedding
  • CODEX_AGENT_SANDBOX: Agent count (11), write/read-only classification
  • generateCodexConfigBlock: Marker, feature flags, agent limits, per-agent sections
  • stripGsdFromCodexConfig: GSD-only removal, user content preservation, injected key stripping, empty section cleanup, [agents.gsd-*] removal
  • mergeCodexConfig: Three cases + idempotency + existing [features] injection
  • installCodexConfig (integration): End-to-end with real agent files

Commit 2: 0dc8120 -- Code-aware discuss phase with codebase scouting (#727)

commands/gsd/discuss-phase.md:

  • Added Glob and Grep to allowed-tools (needed for codebase scouting)
  • Updated process steps: scout codebase before analysis, code-informed gray areas, code-context in CONTEXT.md
  • Added Task and mcp__context7__* to allowed-tools for auto-advance and library documentation lookup

get-shit-done/workflows/discuss-phase.md (+77 lines):

  • New <step name="scout_codebase"> between check_existing and analyze_phase:
    • Checks .planning/codebase/*.md maps first (CONVENTIONS, STRUCTURE, STACK)
    • Falls back to targeted grep using phase goal terms
    • Builds internal <codebase_context> (reusable assets, established patterns, integration points, creative options)
  • Updated analyze_phase to use codebase_context for grounded analysis
  • Updated present_gray_areas with code context annotation examples
  • Updated discuss_areas with code-context-annotated option examples and Context7 library lookup
  • Updated write_context to include <code_context> section (reusable assets, established patterns, integration points)

get-shit-done/templates/context.md (+14 lines):

  • Added <code_context> section template with Reusable Assets, Established Patterns, and Integration Points subsections
  • Updated good examples to show code context usage

Commit 3: 9124906 -- Analysis paralysis guard, exhaustive cross-check, task-level TDD (#736)

agents/gsd-executor.md (+10 lines):

  • New <analysis_paralysis_guard> section: After 5+ consecutive Read/Grep/Glob calls without Edit/Write/Bash, executor must stop and either write code or report "blocked" with the specific missing information

agents/gsd-plan-checker.md (+2 lines):

  • Exhaustive cross-check in Step 4 (Requirement Coverage): Also read PROJECT.md requirements, not just the phase goal. Any unmapped PROJECT.md requirement relevant to the phase is an automatic blocker

agents/gsd-planner.md (+20 lines):

  • Task-level TDD guidance: When a task creates/modifies production code, add tdd="true" and <behavior> block with explicit test expectations
  • XML example showing the full task structure with <behavior> element
  • Exception list: checkpoint tasks, config-only, docs, migrations, glue code, styling-only

Relationship to Other PRs

This is PR #4 of 6 from the dev-bugfix branch:

PR Title Status Dependency
#2 Milestone completion bugs Merged None
#3 Cross-platform Windows CI fixes Open None
#4 (this) Codex parity, discuss scouting, agent guards Open None
#5 Agent frontmatter + heredoc fix Open Merge #4 first
#6 CLI/config bug fixes Open None
#1 MCP migration helper Open Targets dev-bugfix

PR #5 dependency: PR #5 adds agent frontmatter parsing improvements that build on the agent definitions modified here (gsd-executor, gsd-planner, gsd-plan-checker). No code conflicts, but they touch overlapping agent files. Recommend merging #4 before #5.

Testing

New Tests (codex-config.test.cjs)

Suite Tests Focus
getCodexSkillAdapterHeader 4 Section presence, invocation syntax, AskUserQuestion mapping, Task mapping
convertClaudeAgentToCodexAgent 3 Frontmatter cleanup, slash command conversion, no-frontmatter passthrough
generateCodexAgentToml 4 Sandbox modes (workspace-write, read-only, default), developer_instructions
CODEX_AGENT_SANDBOX 3 Agent count (11), write classification, read-only classification
generateCodexConfigBlock 4 Marker, feature flags, agent limits, per-agent sections
stripGsdFromCodexConfig 5 GSD-only, user preservation, injected keys, empty sections, agent sections
mergeCodexConfig 5 Create new, replace existing, append without marker, inject features, idempotency
installCodexConfig (integration) 1 End-to-end with real agent .md files
Total new 30

Full Suite Results

# tests 449
# suites 87
# pass 449
# fail 0
# cancelled 0
# skipped 0
# duration_ms 7032

All 449 tests pass (30 new + 419 existing). Zero failures, zero skipped.

Impact

  • Codex users: Can now run the full GSD workflow on Codex -- discuss-phase questioning works via request_user_input, agent spawning works via spawn_agent, and each agent gets proper sandbox isolation through generated .toml configs
  • All runtimes: discuss-phase now scouts the codebase before asking questions, producing more relevant gray areas and code-aware options. CONTEXT.md captures reusable assets and integration points for downstream agents
  • Agent quality: Executor paralysis guard prevents context budget waste on read-loops. Plan-checker exhaustive cross-check catches silently dropped requirements. Task-level TDD ensures behavioral contracts exist before implementation
  • No breaking changes: All changes are additive. Existing workflows, agent behavior, and test infrastructure are unaffected
  • Files changed: 8 files, +845/-25 lines

glittercowboy and others added 3 commits February 28, 2026 02:23
…agent role generation

Expand Codex adapter with AskUserQuestion → request_user_input parameter
mapping (including multiSelect workaround and Execute mode fallback) and
Task() → spawn_agent mapping (parallel fan-out, result parsing).

Add convertClaudeAgentToCodexAgent() that generates <codex_agent_role>
headers with role/tools/purpose and cleans agent frontmatter.

Generate config.toml with [features] (multi_agent, request_user_input)
and [agents.gsd-*] role sections pointing to per-agent .toml configs
with sandbox_mode (workspace-write/read-only) and developer_instructions.

Config merge handles 3 cases: new file, existing with GSD marker
(truncate + re-append), existing without marker (inject features +
append agents). Uninstall strips all GSD content including injected
feature keys while preserving user settings.

Closes gsd-build#779

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add lightweight codebase scanning before gray area identification:
- New scout_codebase step checks for existing maps or does targeted grep
- Gray areas annotated with code context (existing components, patterns)
- Discussion options informed by what already exists in the codebase
- Context7 integration for library-specific questions
- CONTEXT.md template includes code_context section

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…nd task-level TDD (gsd-build#736)

- gsd-executor: Add <analysis_paralysis_guard> block after deviation_rules.
  If executor makes 5+ consecutive Read/Grep/Glob calls without any
  Edit/Write/Bash action, it must stop and either write or report blocked.
  Prevents infinite analysis loops that stall execution.

- gsd-plan-checker: Add exhaustive cross-check in Step 4 requirement coverage.
  Checker now also reads PROJECT.md requirements (not just phase goal) to
  verify no relevant requirement is silently dropped. Unmapped requirements
  become automatic blockers listed explicitly in issues.

- gsd-planner: Add task-level TDD guidance alongside existing TDD Detection.
  For code-producing tasks in standard plans, tdd="true" + <behavior> block
  makes test expectations explicit before implementation. Complements the
  existing dedicated TDD plan approach — both can coexist.

Co-authored-by: CyPack <GITHUB_EMAIL_ADRESIN>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants