-
Notifications
You must be signed in to change notification settings - Fork 488
fix: tool responses printing for MultimodalConversableAgent handles multimodal content format #2043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@Parth220 could you run pre-commit? Thanks! https://docs.ag2.ai/latest/docs/contributor-guide/pre-commit/ |
Codecov Report❌ Patch coverage is
... and 36 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Code Review for PR #2043SummaryThis PR addresses a validation error when using Changes Review1. autogen/events/agent_events.py (Lines 245-253)Added: Multimodal content extraction for tool responses if role == "tool":
# Handle multimodal content format - extract text if content is a list of dicts
content = event.get('content')
if isinstance(content, list) and len(content) > 0 and isinstance(content[0], dict):
# Extract text from multimodal format [{'type': 'text', 'text': '...'}]
text_parts = []
for item in content:
if isinstance(item, dict) and item.get('type') == 'text':
text_parts.append(item.get('text', ''))
event['content'] = ''.join(text_parts)Issues Found:
2. autogen/agentchat/contrib/multimodal_conversable_agent.py (Lines 116-124)Added: Tool response unpacking logic # Fix tool response format for OpenAI API
fixed_messages = []
for msg in messages_with_b64_img:
if isinstance(msg, dict) and msg.get("role") == "tool" and "tool_responses" in msg:
# Unpack tool_responses to individual tool messages with tool_call_id
for tool_response in msg["tool_responses"]:
fixed_messages.append(tool_response)
else:
fixed_messages.append(msg)Issues Found:
Code Quality AssessmentStrengths:
Concerns:
Test Coverage IssuesMajor Concern: No Tests Added The PR description states:
However, no test files were modified in this PR. Codecov reports 14.28% patch coverage with 6 lines missing coverage. Required Tests:
Suggested Test Location:
Security ConcernsLow Risk, but consider:
Performance Considerations
RecommendationsHigh Priority (Should Fix)
Medium Priority (Should Consider)
Low Priority (Nice to Have)
Suggested Code ImprovementsFor agent_events.py: if role == "tool":
# Handle multimodal content format - extract text if content is a list of dicts
content = event.get('content')
if isinstance(content, list) and len(content) > 0:
# Extract text from multimodal format [{'type': 'text', 'text': '...'}]
text_parts = []
for item in content:
if not isinstance(item, dict):
continue # Skip non-dict items
if item.get('type') == 'text':
text_value = item.get('text')
if text_value is not None:
text_parts.append(str(text_value))
# Only modify if we found text content
if text_parts:
event = event.copy() # Don't mutate the original
event['content'] = ''.join(text_parts)
return ToolResponseEvent(**event, sender=sender.name, recipient=recipient.name, uuid=uuid)ConclusionOverall Assessment: Needs Work The PR addresses a legitimate issue with a reasonable approach, but has significant gaps:
Recommendation: Request changes
Estimated Effort to Fix: 2-3 hours
Additional Notes
Let me know if you need help writing the tests or making the recommended improvements! Review generated by Claude Code |
Why are these changes needed?
Fix for Autogen MultimodalConversableAgent Tool Response Issue
Problem
When using
MultimodalConversableAgentwith tool functions, the agent wraps tool responses in a multimodal format:[{'type': 'text', 'text': 'actual response text'}]However, the
ToolResponseEventin autogen expectscontentto be a simple string, causing validation errors:Root Cause
The
MultimodalConversableAgentformats all responses (including tool responses) in a multimodal format to support both text and images. But the event validation system expects tool responses to be simple strings.Solution
Modified
agent_events.pyin thecreate_received_event_modelfunction for "tool" responses to extract text from the multimodal format:This extracts the actual text content from the multimodal format before creating the ToolResponseEvent.
Related issue number
N/A
Checks