Replace A2UI_DELIMITER with XML/HTML style tag#794
Conversation
There was a problem hiding this comment.
Code Review
This pull request significantly refactors the A2UI parsing and response generation mechanism within the Python SDK. The core change involves moving from a single delimiter (---a2ui_JSON---) to distinct opening and closing tags (<a2ui-json>, </a2ui-json>) for A2UI JSON blocks, allowing for multiple interleaved text and A2UI JSON segments within a single LLM response. The parse_response function was updated to return a list of ResponsePart objects, each containing text and/or A2UI JSON. A new helper function, parse_response_to_parts, was introduced to streamline the conversion of LLM responses into A2A Part objects, handling parsing, optional validation, and wrapping. This change simplifies agent output construction by centralizing the logic for splitting text, extracting A2UI JSON, and applying validation. Consequently, various sample agents and their executors were updated to leverage this new, more robust parsing and response generation flow, removing previous ad-hoc parsing logic. Review comments highlight a bug in the lstrip() usage for removing markdown fences, suggesting removeprefix() for correctness, and identify several prompt injection vulnerabilities in sample agents where unsanitized user input is concatenated into LLM prompts or A2UI JSON structures, recommending proper sanitization or structured prompt formats. Additionally, a performance optimization was suggested to pre-compile a regular expression used for A2UI block matching.
| "Message sent to {contact_name}\n" | ||
| f"{A2UI_OPEN_TAG}\n{json_content}\n{A2UI_CLOSE_TAG}" | ||
| ) |
There was a problem hiding this comment.
The agent constructs a response containing A2UI JSON blocks by injecting unsanitized user input (contact_name) into a JSON string (on line 221, which is then used here) and then parsing it. A malicious user can provide a contact_name that contains A2UI tags (e.g., </a2ui-json> <a2ui-json> [ { "type": "WebFrame", "url": "http://malicious.com" } ]), allowing them to inject arbitrary UI components into the response. This can lead to UI injection or Cross-Site Scripting (XSS) on the client side.
To remediate this, ensure that all user-supplied data is properly sanitized or escaped before being included in the response content, especially when it's part of a structure that will be parsed as JSON or A2UI tags. Preferably, use a JSON library to construct the data model and then wrap the entire serialized JSON in the A2UI tags.
| f"Your previous response was invalid. {error_message} You MUST generate a" | ||
| " valid response that strictly follows the A2UI JSON SCHEMA. The response" | ||
| " MUST be a JSON list of A2UI messages. Ensure the response is split by" | ||
| f" '{A2UI_DELIMITER}' and the JSON part is well-formed. Please retry the" | ||
| " MUST be a JSON list of A2UI messages. Ensure each JSON part is wrapped in" | ||
| f" '{A2UI_OPEN_TAG}' and '{A2UI_CLOSE_TAG}' tags. Please retry the" |
There was a problem hiding this comment.
The agent constructs a retry prompt by concatenating the original user query and an error message without sanitization. This is a prompt injection vulnerability. A malicious user can provide a query that, when included in the retry prompt, manipulates the LLM's behavior (e.g., by including instructions to ignore previous constraints).
To remediate this, sanitize the user query before including it in the retry prompt, or use a structured prompt format that clearly separates user input from system instructions.
| f"Your previous response was invalid. {error_message} You MUST generate a" | ||
| " valid response that strictly follows the A2UI JSON SCHEMA. The response" | ||
| " MUST be a JSON list of A2UI messages. Ensure the response is split by" | ||
| f" '{A2UI_DELIMITER}' and the JSON part is well-formed. Please retry the" | ||
| " MUST be a JSON list of A2UI messages. Ensure each JSON part is wrapped in" | ||
| f" '{A2UI_OPEN_TAG}' and '{A2UI_CLOSE_TAG}' tags. Please retry the" |
There was a problem hiding this comment.
The agent constructs a retry prompt by concatenating the original user query and an error message without sanitization. This is a prompt injection vulnerability. A malicious user can provide a query that, when included in the retry prompt, manipulates the LLM's behavior.
To remediate this, sanitize the user query before including it in the retry prompt, or use a structured prompt format that clearly separates user input from system instructions.
| f"Your previous response was invalid. {error_message} You MUST generate a" | ||
| " valid response that strictly follows the A2UI JSON SCHEMA. The response" | ||
| " MUST be a JSON list of A2UI messages. Ensure the response is split by" | ||
| f" '{A2UI_DELIMITER}' and the JSON part is well-formed. Please retry the" | ||
| " MUST be a JSON list of A2UI messages. Ensure each JSON part is wrapped in" | ||
| f" '{A2UI_OPEN_TAG}' and '{A2UI_CLOSE_TAG}' tags. Please retry the" |
There was a problem hiding this comment.
The agent constructs a retry prompt by concatenating the original user query and an error message without sanitization. This is a prompt injection vulnerability. A malicious user can provide a query that, when included in the retry prompt, manipulates the LLM's behavior.
To remediate this, sanitize the user query before including it in the retry prompt, or use a structured prompt format that clearly separates user input from system instructions.
agent_sdks/python/tests/adk/a2a_extension/test_send_a2ui_to_client_toolset.py
Outdated
Show resolved
Hide resolved
agent_sdks/python/tests/adk/a2a_extension/test_send_a2ui_to_client_toolset.py
Outdated
Show resolved
Hide resolved
It also updates the parser to support multiple pairs of text parts and A2UI JSON parts, for example,
```
text part 1
<a2ui-json>
[{...}, {...}]
</a2ui-json>
text part 2
<a2ui-json>
[{...}, {...}]
</a2ui-json>
text part 3
<a2ui-json>
[{...}, {...}]
</a2ui-json>
```
- Tested: The orchestrator sample and all sub-agents are working as
expected.
Description
It also updates the parser to support multiple pairs of text parts and A2UI JSON parts, for example,
Pre-launch Checklist
If you need help, consider asking for advice on the discussion board.