fix: summarize tool messages during role reversal in user simulator (Python)#224
fix: summarize tool messages during role reversal in user simulator (Python)#224Aryansharma28 wants to merge 5 commits intomainfrom
Conversation
Align Python reverse_roles() with JS messageRoleReversal(). Tool messages are now summarized as plain text instead of being kept as-is or dropped, preventing consecutive assistant roles and lost tool context in the user simulator's conversation history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
drewdrewthis
left a comment
There was a problem hiding this comment.
@Aryansharma28 tests
Align Python reverse_roles() with JS messageRoleReversal(). Tool messages are now summarized as plain text instead of being kept as-is or dropped, preventing consecutive assistant roles and lost tool context in the user simulator's conversation history. - Use isinstance() instead of type() == dict - Add 30 unit tests covering all helper functions and edge cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
drewdrewthis
left a comment
There was a problem hiding this comment.
The title of the PR doesn't not say that it is summarizing tool calls. this is actually a bit concerning and I think needs further justification?
- _stringify_value: remove special None branch; json.dumps(None) = "null"
which is JSON-consistent and avoids a Python-specific "None" string
- _summarize_tool_message: document that text content is intentionally
dropped when an assistant message has both content and tool_calls
- reverse_roles: restore the guard that drops bare {"role": "assistant"}
messages with no content key — Anthropic rejects them if passed through
as {"role": "user"} with no content (regression from the original fix)
- test_reverse_roles: update test_none expectation to "null", loosen
test_non_serializable_fallback off CPython-specific repr, rename
test_message_without_content_preserved to clarify content=None vs
missing key, add test_bare_role_only_message_is_dropped for the guard
- test_multiturn_tool_calls: fix mutable default argument in shopping_agent
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Hi @drewdrewthis — thanks for the review comments, addressing both below. Re: tests — unit tests are now included in Re: the title / summarization concern — totally fair, the original title was vague. Updated the title and PR body to explain this properly, but here's the reasoning: The user simulator builds its prompt by calling
So the options were:
Option 3 is what the JS |
What this fixes
The user simulator builds its prompt by calling
reverse_roles()on the agent's conversation history. Two classes of messages cannot simply have their role flipped:role: "tool"messages — tool result messages. Relabelling them asrole: "user"produces an invalid request that both OpenAI and Anthropic APIs reject outright.role: "assistant"messages withtool_calls— tool call messages. Same problem:tool_callsis not valid on ausermessage.The old code worked around this by keeping tool-call messages as-is and dropping tool-result messages. That caused two separate bugs:
assistantroles in the reversed history (the user simulator's context) when the agent made tool callsWhat this changes
Both message types are now summarized as plain text and attributed to
role: "user"(i.e. the agent's perspective after reversal):{"role": "assistant", "tool_calls": [...]}→[Called tool search_products with: {"query": "headphones"}]{"role": "tool", "content": "..."}→[Tool result from search_products: [...]]This is exactly what the JS
messageRoleReversal()already does. This PR brings the Python implementation into alignment.A bare
{"role": "assistant"}message with nocontentkey at all is silently dropped — some models emit these and passing them through would produce an invalid{"role": "user"}message (Anthropic rejects it).Files changed
python/scenario/_utils/utils.py— rewritesreverse_roles()with three new helper functions (_stringify_value,_has_tool_content,_summarize_tool_message)python/tests/test_reverse_roles.py— 31 unit tests covering all message shapes and edge casespython/examples/test_multiturn_tool_calls.py— E2E example: 10-turn shopping conversation where the agent makes tool calls on multiple turns, verifying the user simulator handles the full history correctly