Skip to content

fix(preprocess): deterministic Path-A for pre-validated harness + nudge prose-only agent turns#274

Open
iraj465 wants to merge 1 commit into
mainfrom
fix/preprocess-prevalidated-and-prose-stall
Open

fix(preprocess): deterministic Path-A for pre-validated harness + nudge prose-only agent turns#274
iraj465 wants to merge 1 commit into
mainfrom
fix/preprocess-prevalidated-and-prose-stall

Conversation

@iraj465

@iraj465 iraj465 commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

Two robust, workload-agnostic fixes for the v3 preprocess + agent loop. Both were found while running the gpt-oss-120b fused_moe kernel-optimization flow in mixed mode, but neither is specific to that workload — they are general correctness fixes.

1. Pre-validated harness → deterministic Path-A bypass

run_preprocess_v3 always drove preprocess through the LLM orchestrator, even when the caller supplied a pre-validated harness. The orchestrator's Step-0 "shapes pre-check" then either:

  • diverted a shape-bearing task to the harness-generator (regenerate a harness from scratch), or
  • simply failed to converge (100+ LLM steps),

…burning the entire preprocess budget (900s soft cap) without ever producing a benchmark_baseline, so the kernel run aborted.

A pre-validated harness already encodes its authoritative shapes internally — the whole A1 sequence (collect_baseline → collect_profile → render_commandment) is deterministic and needs no LLM. _run_prevalidated_path_a() runs it directly and returns PreprocessResult(path_taken="A"), preserving the existing worktree-bypass validate_harness gate. Profiling stays advisory/non-fatal (matches the orchestrator escape-hatch contract). Opt-out: GEAK_NO_PREVALIDATED_BYPASS=1. A prompt-level exemption in the Step-0 classifier is kept as a secondary guard.

Result: preprocess completes in ~260s instead of timing out at 900s.

2. Prose-only agent turns are nudged, not silently accepted

DefaultAgent.parse_action returned {"output":"","returncode":0} for a turn with no fenced bash and no tool call, and the final if all_action["output"] or all_action["returncode"] == 0: accepted it as a successful no-op. A model that believes it already finished (e.g. narrates "Done." / "tasks submitted") then repeats that prose every step with no corrective signal, looping until the step limit.

Observed in the heterogeneous task-planner: 143 prose turns → 0 tool calls → LimitsExceeded. Fix: track whether any action (bash / tool / skill) actually dispatched; if not, raise FormatError (a NonTerminatingException) so the model is nudged to emit a real action. Multi-action turns still raise the existing format error. test_empty_actions_handling still passes (the nudge is non-terminating → the next turn submits).

Test plan

  • tests/agents/test_default.py — new test_prose_only_turn_is_nudged_not_silently_accepted; full suite (13) passes
  • tests/run/test_preprocess_v3_bugfixes.py — new test_prevalidated_harness_bypasses_llm_orchestrator + test_prevalidated_bypass_opt_out_env; full suite (16) passes
  • End-to-end: gpt-oss-120b fused_moe mixed-mode run now completes preprocess (263s) and the planner submits tasks that dispatch to workers

🤖 Generated with Claude Code

…ge prose-only agent turns

Two robust, workload-agnostic fixes for the v3 preprocess + agent loop that
caused fused_moe (and any shape-bearing) kernel runs to stall in mixed mode.

1. Pre-validated harness deterministic bypass (adapter.py, orchestrator.py).
   When the caller supplies a harness it already validated end-to-end, the
   entire A1 preprocess (collect_baseline -> collect_profile -> render_commandment)
   is deterministic — there is nothing for the LLM orchestrator to decide.
   Driving it through the LLM anyway let the Step-0 classifier misroute a
   shape-bearing task to the harness-GENERATOR (regenerate from scratch) or
   fail to converge, burning the whole preprocess budget (900s soft cap) with
   no benchmark_baseline. `_run_prevalidated_path_a()` runs the deterministic
   sequence directly and returns PreprocessResult(path_taken="A"), keeping the
   same worktree-bypass validate_harness gate. Opt-out: GEAK_NO_PREVALIDATED_BYPASS=1.
   A prompt-level exemption in the orchestrator Step-0 classifier is kept as a
   secondary guard. Result: preprocess completes in ~260s instead of timing out.

2. Prose-only agent turns are nudged, not silently accepted (default.py).
   `parse_action` returned {"output":"","returncode":0} for a turn with no
   fenced bash and no tool call, and the `returncode == 0` check accepted it as
   a successful no-op. A model that believes it already finished then repeats
   prose every step with no corrective signal, looping until the step limit
   (observed: heterogeneous task-planner, 143 prose turns -> 0 tool calls ->
   LimitsExceeded). Track whether any action (bash/tool/skill) actually
   dispatched; if not, raise FormatError (NonTerminating) so the model is told
   to emit a real action. Multi-action turns still raise the existing error.

Tests: new prose-only-nudge regression in tests/agents/test_default.py;
new pre-validated bypass + opt-out tests in tests/run/test_preprocess_v3_bugfixes.py.
All agent + preprocess suites pass.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Base automatically changed from gwiab-scheduler to main June 12, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant