feat(openai): Qwen3 (qwen3_coder/qwen3_xml) tool-call support by carlushuang · Pull Request #1319 · ROCm/ATOM

carlushuang · 2026-06-23T03:12:22Z

Qwen3 (qwen3_coder / qwen3_xml) tool-call support

ATOM's OpenAI/Anthropic servers only parsed the Kimi-K2 tool-call token format (<|tool_calls_section_begin|>...). Qwen3.5/Qwen3.6 emit tool calls as qwen3_coder XML (<tool_call><function=NAME><parameter=PNAME>VALUE</parameter></function></tool_call>), so those calls were returned as plain text and never surfaced as structured tool_calls. As a result agent frontends (qwen-code, OpenCode, Cline, etc.) could not drive tools against a Qwen3.x model served by ATOM — the model would request a tool, but nothing executed.

This adds Qwen3 XML support alongside the existing Kimi format, auto-detected from the output, mirroring the qwen3_coder/qwen3_xml parsers in vLLM and SGLang.

Changes

tool_parser.py — parse <tool_call>/<function=>/<parameter=> into OpenAI tool_calls. Because the XML is typeless, parameter values are coerced to their declared JSON-Schema type using the request's tools (int/float/bool/null/object/array), otherwise left as strings. Both non-streaming and streaming are supported; for streaming, content is streamed normally and the <tool_call> block is buffered and parsed when complete — which sidesteps the partial-XML streaming bugs reported against vLLM/SGLang (e.g. invalid JSON for multiple <function=> per <tool_call>). The Kimi token path is unchanged.
protocol.py — in to_template_dict, deserialize tool_calls[].function.arguments (a JSON string in OpenAI requests) into a mapping. Multi-turn chat templates that iterate tool_call.arguments.items() (Qwen, Hermes) otherwise raise TypeError: Can only get item pairs from a mapping on the turn after a tool call, breaking agentic loops. This matches how vLLM/SGLang deserialize arguments before applying the chat template.
serving_chat.py / api_server.py — thread the request's tools into the parsers for type coercion. The new parameter defaults to None, preserving existing behavior for callers that don't pass it.

Verification

Tested end-to-end with qwen-code pointed at ATOM serving Qwen3.6-27B BF16 (OpenAI endpoint, streaming): the agent issues a write_file tool call (file created with correct content) and a follow-up run-shell tool call, then reports the program's output — the full multi-turn loop completes with no 500s. Before this change the same run produced no file (tool call leaked as text) and 500'd on the next turn. The Kimi-K2 path is unchanged and the new tools argument is optional.

Reference

Format and type-coercion semantics follow Qwen's Apache-2.0 qwen3coder_tool_parser.py and the vLLM/SGLang qwen3_coder/qwen3_xml parsers.

ATOM's OpenAI/Anthropic servers previously only parsed the Kimi-K2 tool-call token format (<|tool_calls_section_begin|>...), so Qwen3.5/Qwen3.6 tool calls -- emitted as qwen3_coder XML (<tool_call><function=NAME><parameter=...>) -- were returned as plain text and never surfaced as structured tool_calls. Agent frontends (qwen-code, OpenCode, etc.) therefore could not drive tools. Add Qwen3 XML parsing alongside the Kimi format, auto-detected: - tool_parser.py: parse <tool_call>/<function=>/<parameter=> into OpenAI tool_calls, with JSON-Schema type coercion of parameter values from the request's tools (the XML is typeless). Non-streaming + streaming (stream content, then buffer+parse the tool-call block -- robust against the partial-XML streaming edge cases seen in vLLM/SGLang). Kimi path unchanged. - protocol.py: deserialize tool_calls[].function.arguments (a JSON string in OpenAI requests) to a mapping in to_template_dict, so multi-turn chat templates that iterate arguments.items() (Qwen, Hermes) render tool history instead of raising "Can only get item pairs from a mapping". - serving_chat.py / api_server.py: thread the request's tools into the parsers for type coercion (default None preserves existing behavior). Verified: Qwen3.6-27B BF16 served by ATOM drives qwen-code end-to-end on gfx1151 -- write_file + run-shell tool calls execute and the agent reports the program output.

The previous commit's threading of request.tools matched the stream_completion_response / stream_completion_response_fanout calls in the /v1/completions handler too. CompletionRequest has no `tools` field, so /v1/completions raised "AttributeError: 'CompletionRequest' object has no attribute 'tools'" (HTTP 500). Tool calling only applies to chat; drop tools from the text-completion stream calls.

The parser generated ids from a per-response index (call_0, call_1, ...), so the first tool call in every assistant turn was call_0. OpenAI tool-call ids must be unique across the whole conversation; agentic clients (e.g. qwen-code) dedupe by id and silently ignore every repeat -> the tool never executes and the model retries forever (endless tool-call loop on any multi-tool task). Use a random call_<uuid> id at both the non-streaming and streaming emit sites.

carlushuang added 2 commits June 23, 2026 03:11

carlushuang mentioned this pull request Jun 24, 2026

[gfx1151] Online INT8 W8A8 for Qwen3.6 27B / 35B-A3B on RDNA3.5, with working MTP #1337

Open

carlushuang mentioned this pull request Jun 24, 2026

[gfx1151] Qwen3.5/3.6 (GDN hybrid) BF16 on RDNA3.5 via native Triton attention #1314

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(openai): Qwen3 (qwen3_coder/qwen3_xml) tool-call support#1319

feat(openai): Qwen3 (qwen3_coder/qwen3_xml) tool-call support#1319
carlushuang wants to merge 3 commits into
mainfrom
carhuang/qwen3_xml_tool_parser

carlushuang commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

carlushuang commented Jun 23, 2026

Qwen3 (qwen3_coder / qwen3_xml) tool-call support

Changes

Verification

Reference

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant