fix(preprocess): deterministic Path-A for pre-validated harness + nudge prose-only agent turns#274
Open
iraj465 wants to merge 1 commit into
Open
fix(preprocess): deterministic Path-A for pre-validated harness + nudge prose-only agent turns#274iraj465 wants to merge 1 commit into
iraj465 wants to merge 1 commit into
Conversation
…ge prose-only agent turns
Two robust, workload-agnostic fixes for the v3 preprocess + agent loop that
caused fused_moe (and any shape-bearing) kernel runs to stall in mixed mode.
1. Pre-validated harness deterministic bypass (adapter.py, orchestrator.py).
When the caller supplies a harness it already validated end-to-end, the
entire A1 preprocess (collect_baseline -> collect_profile -> render_commandment)
is deterministic — there is nothing for the LLM orchestrator to decide.
Driving it through the LLM anyway let the Step-0 classifier misroute a
shape-bearing task to the harness-GENERATOR (regenerate from scratch) or
fail to converge, burning the whole preprocess budget (900s soft cap) with
no benchmark_baseline. `_run_prevalidated_path_a()` runs the deterministic
sequence directly and returns PreprocessResult(path_taken="A"), keeping the
same worktree-bypass validate_harness gate. Opt-out: GEAK_NO_PREVALIDATED_BYPASS=1.
A prompt-level exemption in the orchestrator Step-0 classifier is kept as a
secondary guard. Result: preprocess completes in ~260s instead of timing out.
2. Prose-only agent turns are nudged, not silently accepted (default.py).
`parse_action` returned {"output":"","returncode":0} for a turn with no
fenced bash and no tool call, and the `returncode == 0` check accepted it as
a successful no-op. A model that believes it already finished then repeats
prose every step with no corrective signal, looping until the step limit
(observed: heterogeneous task-planner, 143 prose turns -> 0 tool calls ->
LimitsExceeded). Track whether any action (bash/tool/skill) actually
dispatched; if not, raise FormatError (NonTerminating) so the model is told
to emit a real action. Multi-action turns still raise the existing error.
Tests: new prose-only-nudge regression in tests/agents/test_default.py;
new pre-validated bypass + opt-out tests in tests/run/test_preprocess_v3_bugfixes.py.
All agent + preprocess suites pass.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two robust, workload-agnostic fixes for the v3 preprocess + agent loop. Both were found while running the gpt-oss-120b
fused_moekernel-optimization flow in mixed mode, but neither is specific to that workload — they are general correctness fixes.1. Pre-validated harness → deterministic Path-A bypass
run_preprocess_v3always drove preprocess through the LLM orchestrator, even when the caller supplied a pre-validated harness. The orchestrator's Step-0 "shapes pre-check" then either:…burning the entire preprocess budget (900s soft cap) without ever producing a
benchmark_baseline, so the kernel run aborted.A pre-validated harness already encodes its authoritative shapes internally — the whole A1 sequence (
collect_baseline → collect_profile → render_commandment) is deterministic and needs no LLM._run_prevalidated_path_a()runs it directly and returnsPreprocessResult(path_taken="A"), preserving the existing worktree-bypassvalidate_harnessgate. Profiling stays advisory/non-fatal (matches the orchestrator escape-hatch contract). Opt-out:GEAK_NO_PREVALIDATED_BYPASS=1. A prompt-level exemption in the Step-0 classifier is kept as a secondary guard.Result: preprocess completes in ~260s instead of timing out at 900s.
2. Prose-only agent turns are nudged, not silently accepted
DefaultAgent.parse_actionreturned{"output":"","returncode":0}for a turn with no fenced bash and no tool call, and the finalif all_action["output"] or all_action["returncode"] == 0:accepted it as a successful no-op. A model that believes it already finished (e.g. narrates "Done." / "tasks submitted") then repeats that prose every step with no corrective signal, looping until the step limit.Observed in the heterogeneous task-planner: 143 prose turns → 0 tool calls →
LimitsExceeded. Fix: track whether any action (bash / tool / skill) actually dispatched; if not, raiseFormatError(aNonTerminatingException) so the model is nudged to emit a real action. Multi-action turns still raise the existing format error.test_empty_actions_handlingstill passes (the nudge is non-terminating → the next turn submits).Test plan
tests/agents/test_default.py— newtest_prose_only_turn_is_nudged_not_silently_accepted; full suite (13) passestests/run/test_preprocess_v3_bugfixes.py— newtest_prevalidated_harness_bypasses_llm_orchestrator+test_prevalidated_bypass_opt_out_env; full suite (16) passesfused_moemixed-mode run now completes preprocess (263s) and the planner submits tasks that dispatch to workers🤖 Generated with Claude Code