fix(reflect): unwrap JSON answer envelopes#2345
Conversation
nicoloboschi
left a comment
There was a problem hiding this comment.
I tried to reproduce #2298 against the exact model from the issue (ollama-cloud/gpt-oss:120b-cloud, per @benfrank241's comment) before reviewing, and the investigation changes what the right fix looks like.
Reproduction: 35/35 clean runs, no envelope
I replicated the exact reflect LLM call — real build_system_prompt_for_tools + the real done tool schema (including the directive-compliance field from _build_done_tool_with_directives) — and ran a minimal tool loop against gpt-oss:120b-cloud:
| Effort | Runs | Envelope leaks | Clean native done() |
|---|---|---|---|
| low | 6 | 0 | 6 |
| medium | 8 + 15 (heavy "prove per-directive compliance or be REJECTED" directives) | 0 | 23 |
| high | 6 | 0 | 6 |
Every run produced a proper native done tool call with clean markdown in answer. I could not trigger the JSON envelope even with aggressive compliance-checklist directives. (Tested through a current ollama; harmony tool-call parsing surfaces the call natively.)
The root-cause framing in #2298 is wrong
directive_compliance is not a field the model invents to "prove" compliance. It's a real, required Hindsight field — tools_schema.py::_build_done_tool_with_directives() adds it to the done tool whenever directives are present. So the envelope the reporter saw:
{"answer": "...", "directive_compliance": "...", "memory_ids": [...], "mental_model_ids": [], "observation_ids": []}is literally the done tool-call arguments, serialized as text instead of delivered as a native tool call. This is a tool-call delivery/parsing issue specific to gpt-oss's harmony format (the repo already band-aids this — see the done<|channel|>commentary handling in _normalize_tool_name and the done({...})-as-text regex _DONE_CALL_PATTERN). It is not the prompt-anchoring problem the issue hypothesizes. The leak appears to be environment-specific (older ollama / the ollama.com endpoint where the harmony channel isn't parsed into native tool calls).
What that means for this PR
-
The unwrap post-processing is the genuinely useful part. It recovers the answer from a leaked done-args envelope at
_clean_answer_text(the no-native-tool text path) and_clean_done_answer— exactly where a leaked call lands. Includingdirective_compliancein the allowlist is correct (it's a real schema field, not overfitting), and the subset guard sensibly avoids clobbering legitimate JSON answers. This is the same family of recovery shim the repo already ships for harmony leaks. I'd suggest renaming/recommenting it to reflect the real mechanism — "recover adonetool-call that was rendered as text" — rather than "model-invented envelope," so the next reader isn't misled. -
The prompt line is the weak half, and slightly self-contradictory. It instructs the model to "never include fields like
directive_compliance... in the answer text," while thedonetool schema in the same prompt requires the model to producedirective_compliance. Since the actual failure is at the parsing layer (a tool call that wasn't parsed as one), a generation-side instruction is aimed at the wrong layer — not harmful, but unlikely to be what fixes the reported symptom. I'd drop or soften it.
Suggestion
Land the unwrap shim (it's a reasonable, correctly-shaped band-aid for a provider leak that's outside this codebase to fix properly), but reframe the PR/issue away from "the model invents a JSON envelope to prove compliance." The durable fix is at the harmony tool-call parsing layer / an ollama version floor; worth a note in the issue since the symptom isn't reproducible on a current parser.
|
@nicoloboschi Addressed in c8cace0.
Validation:
|
Summary
Closes #2298
Tests