Skip to content

fix: summarize tool messages during role reversal in user simulator (Python)#224

Open
Aryansharma28 wants to merge 5 commits intomainfrom
feat/role-reversal-py
Open

fix: summarize tool messages during role reversal in user simulator (Python)#224
Aryansharma28 wants to merge 5 commits intomainfrom
feat/role-reversal-py

Conversation

@Aryansharma28
Copy link
Copy Markdown
Contributor

@Aryansharma28 Aryansharma28 commented Feb 17, 2026

What this fixes

The user simulator builds its prompt by calling reverse_roles() on the agent's conversation history. Two classes of messages cannot simply have their role flipped:

  1. role: "tool" messages — tool result messages. Relabelling them as role: "user" produces an invalid request that both OpenAI and Anthropic APIs reject outright.
  2. role: "assistant" messages with tool_calls — tool call messages. Same problem: tool_calls is not valid on a user message.

The old code worked around this by keeping tool-call messages as-is and dropping tool-result messages. That caused two separate bugs:

  • Consecutive assistant roles in the reversed history (the user simulator's context) when the agent made tool calls
  • Lost tool context — the user simulator had no idea what the agent had just done

What this changes

Both message types are now summarized as plain text and attributed to role: "user" (i.e. the agent's perspective after reversal):

  • {"role": "assistant", "tool_calls": [...]}[Called tool search_products with: {"query": "headphones"}]
  • {"role": "tool", "content": "..."}[Tool result from search_products: [...]]

This is exactly what the JS messageRoleReversal() already does. This PR brings the Python implementation into alignment.

A bare {"role": "assistant"} message with no content key at all is silently dropped — some models emit these and passing them through would produce an invalid {"role": "user"} message (Anthropic rejects it).

Files changed

  • python/scenario/_utils/utils.py — rewrites reverse_roles() with three new helper functions (_stringify_value, _has_tool_content, _summarize_tool_message)
  • python/tests/test_reverse_roles.py — 31 unit tests covering all message shapes and edge cases
  • python/examples/test_multiturn_tool_calls.py — E2E example: 10-turn shopping conversation where the agent makes tool calls on multiple turns, verifying the user simulator handles the full history correctly

Align Python reverse_roles() with JS messageRoleReversal().
Tool messages are now summarized as plain text instead of being
kept as-is or dropped, preventing consecutive assistant roles
and lost tool context in the user simulator's conversation history.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@drewdrewthis drewdrewthis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aryansharma28 and others added 3 commits February 17, 2026 17:12
Align Python reverse_roles() with JS messageRoleReversal().
Tool messages are now summarized as plain text instead of being
kept as-is or dropped, preventing consecutive assistant roles
and lost tool context in the user simulator's conversation history.

- Use isinstance() instead of type() == dict
- Add 30 unit tests covering all helper functions and edge cases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@drewdrewthis drewdrewthis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title of the PR doesn't not say that it is summarizing tool calls. this is actually a bit concerning and I think needs further justification?

- _stringify_value: remove special None branch; json.dumps(None) = "null"
  which is JSON-consistent and avoids a Python-specific "None" string
- _summarize_tool_message: document that text content is intentionally
  dropped when an assistant message has both content and tool_calls
- reverse_roles: restore the guard that drops bare {"role": "assistant"}
  messages with no content key — Anthropic rejects them if passed through
  as {"role": "user"} with no content (regression from the original fix)
- test_reverse_roles: update test_none expectation to "null", loosen
  test_non_serializable_fallback off CPython-specific repr, rename
  test_message_without_content_preserved to clarify content=None vs
  missing key, add test_bare_role_only_message_is_dropped for the guard
- test_multiturn_tool_calls: fix mutable default argument in shopping_agent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Aryansharma28 Aryansharma28 changed the title fix: role reversal for tool messages in user simulator (Python) fix: summarize tool messages during role reversal in user simulator (Python) Mar 5, 2026
@Aryansharma28
Copy link
Copy Markdown
Contributor Author

Hi @drewdrewthis — thanks for the review comments, addressing both below.

Re: tests — unit tests are now included in python/tests/test_reverse_roles.py (31 tests covering all message shapes), and there's also an E2E example in python/examples/test_multiturn_tool_calls.py that runs a 10-turn shopping conversation with real tool calls to verify the user simulator handles the full history end-to-end.

Re: the title / summarization concern — totally fair, the original title was vague. Updated the title and PR body to explain this properly, but here's the reasoning:

The user simulator builds its prompt by calling reverse_roles() on the agent's conversation history. The problem is that two message types cannot be re-labelled with a different role — they have to be converted to plain text:

  • role: "tool" (tool result messages) — both OpenAI and Anthropic APIs reject a user message that has the tool role. You can't just flip the role.
  • role: "assistant" with tool_calls — same issue, tool_calls is not valid on a user message.

So the options were:

  1. Drop them → user simulator loses all tool context and doesn't know what the agent did
  2. Keep them as-is → consecutive assistant roles in the history, which confuses the user simulator
  3. Summarize as plain text → the user simulator sees [Called tool X with: ...] / [Tool result from X: ...] as ordinary user messages — valid API request, and the simulator has full context

Option 3 is what the JS messageRoleReversal() already does. This PR brings Python into alignment with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants