Skip to content

feat(openai): Qwen3 (qwen3_coder/qwen3_xml) tool-call support#1319

Open
carlushuang wants to merge 3 commits into
mainfrom
carhuang/qwen3_xml_tool_parser
Open

feat(openai): Qwen3 (qwen3_coder/qwen3_xml) tool-call support#1319
carlushuang wants to merge 3 commits into
mainfrom
carhuang/qwen3_xml_tool_parser

Conversation

@carlushuang

Copy link
Copy Markdown
Collaborator

Qwen3 (qwen3_coder / qwen3_xml) tool-call support

ATOM's OpenAI/Anthropic servers only parsed the Kimi-K2 tool-call token format (<|tool_calls_section_begin|>...). Qwen3.5/Qwen3.6 emit tool calls as qwen3_coder XML (<tool_call><function=NAME><parameter=PNAME>VALUE</parameter></function></tool_call>), so those calls were returned as plain text and never surfaced as structured tool_calls. As a result agent frontends (qwen-code, OpenCode, Cline, etc.) could not drive tools against a Qwen3.x model served by ATOM — the model would request a tool, but nothing executed.

This adds Qwen3 XML support alongside the existing Kimi format, auto-detected from the output, mirroring the qwen3_coder/qwen3_xml parsers in vLLM and SGLang.

Changes

  • tool_parser.py — parse <tool_call>/<function=>/<parameter=> into OpenAI tool_calls. Because the XML is typeless, parameter values are coerced to their declared JSON-Schema type using the request's tools (int/float/bool/null/object/array), otherwise left as strings. Both non-streaming and streaming are supported; for streaming, content is streamed normally and the <tool_call> block is buffered and parsed when complete — which sidesteps the partial-XML streaming bugs reported against vLLM/SGLang (e.g. invalid JSON for multiple <function=> per <tool_call>). The Kimi token path is unchanged.
  • protocol.py — in to_template_dict, deserialize tool_calls[].function.arguments (a JSON string in OpenAI requests) into a mapping. Multi-turn chat templates that iterate tool_call.arguments.items() (Qwen, Hermes) otherwise raise TypeError: Can only get item pairs from a mapping on the turn after a tool call, breaking agentic loops. This matches how vLLM/SGLang deserialize arguments before applying the chat template.
  • serving_chat.py / api_server.py — thread the request's tools into the parsers for type coercion. The new parameter defaults to None, preserving existing behavior for callers that don't pass it.

Verification

Tested end-to-end with qwen-code pointed at ATOM serving Qwen3.6-27B BF16 (OpenAI endpoint, streaming): the agent issues a write_file tool call (file created with correct content) and a follow-up run-shell tool call, then reports the program's output — the full multi-turn loop completes with no 500s. Before this change the same run produced no file (tool call leaked as text) and 500'd on the next turn. The Kimi-K2 path is unchanged and the new tools argument is optional.

Reference

Format and type-coercion semantics follow Qwen's Apache-2.0 qwen3coder_tool_parser.py and the vLLM/SGLang qwen3_coder/qwen3_xml parsers.

ATOM's OpenAI/Anthropic servers previously only parsed the Kimi-K2 tool-call
token format (<|tool_calls_section_begin|>...), so Qwen3.5/Qwen3.6 tool calls --
emitted as qwen3_coder XML (<tool_call><function=NAME><parameter=...>) -- were
returned as plain text and never surfaced as structured tool_calls. Agent
frontends (qwen-code, OpenCode, etc.) therefore could not drive tools.

Add Qwen3 XML parsing alongside the Kimi format, auto-detected:

- tool_parser.py: parse <tool_call>/<function=>/<parameter=> into OpenAI
  tool_calls, with JSON-Schema type coercion of parameter values from the
  request's tools (the XML is typeless). Non-streaming + streaming (stream
  content, then buffer+parse the tool-call block -- robust against the
  partial-XML streaming edge cases seen in vLLM/SGLang). Kimi path unchanged.
- protocol.py: deserialize tool_calls[].function.arguments (a JSON string in
  OpenAI requests) to a mapping in to_template_dict, so multi-turn chat
  templates that iterate arguments.items() (Qwen, Hermes) render tool history
  instead of raising "Can only get item pairs from a mapping".
- serving_chat.py / api_server.py: thread the request's tools into the parsers
  for type coercion (default None preserves existing behavior).

Verified: Qwen3.6-27B BF16 served by ATOM drives qwen-code end-to-end on
gfx1151 -- write_file + run-shell tool calls execute and the agent reports the
program output.
The previous commit's threading of request.tools matched the
stream_completion_response / stream_completion_response_fanout calls in the
/v1/completions handler too. CompletionRequest has no `tools` field, so
/v1/completions raised "AttributeError: 'CompletionRequest' object has no
attribute 'tools'" (HTTP 500). Tool calling only applies to chat; drop tools
from the text-completion stream calls.
The parser generated ids from a per-response index (call_0, call_1, ...), so the
first tool call in every assistant turn was call_0. OpenAI tool-call ids must be
unique across the whole conversation; agentic clients (e.g. qwen-code) dedupe by
id and silently ignore every repeat -> the tool never executes and the model
retries forever (endless tool-call loop on any multi-tool task). Use a random
call_<uuid> id at both the non-streaming and streaming emit sites.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant