feat(openai): Qwen3 (qwen3_coder/qwen3_xml) tool-call support#1319
Open
carlushuang wants to merge 3 commits into
Open
feat(openai): Qwen3 (qwen3_coder/qwen3_xml) tool-call support#1319carlushuang wants to merge 3 commits into
carlushuang wants to merge 3 commits into
Conversation
ATOM's OpenAI/Anthropic servers previously only parsed the Kimi-K2 tool-call token format (<|tool_calls_section_begin|>...), so Qwen3.5/Qwen3.6 tool calls -- emitted as qwen3_coder XML (<tool_call><function=NAME><parameter=...>) -- were returned as plain text and never surfaced as structured tool_calls. Agent frontends (qwen-code, OpenCode, etc.) therefore could not drive tools. Add Qwen3 XML parsing alongside the Kimi format, auto-detected: - tool_parser.py: parse <tool_call>/<function=>/<parameter=> into OpenAI tool_calls, with JSON-Schema type coercion of parameter values from the request's tools (the XML is typeless). Non-streaming + streaming (stream content, then buffer+parse the tool-call block -- robust against the partial-XML streaming edge cases seen in vLLM/SGLang). Kimi path unchanged. - protocol.py: deserialize tool_calls[].function.arguments (a JSON string in OpenAI requests) to a mapping in to_template_dict, so multi-turn chat templates that iterate arguments.items() (Qwen, Hermes) render tool history instead of raising "Can only get item pairs from a mapping". - serving_chat.py / api_server.py: thread the request's tools into the parsers for type coercion (default None preserves existing behavior). Verified: Qwen3.6-27B BF16 served by ATOM drives qwen-code end-to-end on gfx1151 -- write_file + run-shell tool calls execute and the agent reports the program output.
The previous commit's threading of request.tools matched the stream_completion_response / stream_completion_response_fanout calls in the /v1/completions handler too. CompletionRequest has no `tools` field, so /v1/completions raised "AttributeError: 'CompletionRequest' object has no attribute 'tools'" (HTTP 500). Tool calling only applies to chat; drop tools from the text-completion stream calls.
The parser generated ids from a per-response index (call_0, call_1, ...), so the first tool call in every assistant turn was call_0. OpenAI tool-call ids must be unique across the whole conversation; agentic clients (e.g. qwen-code) dedupe by id and silently ignore every repeat -> the tool never executes and the model retries forever (endless tool-call loop on any multi-tool task). Use a random call_<uuid> id at both the non-streaming and streaming emit sites.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Qwen3 (qwen3_coder / qwen3_xml) tool-call support
ATOM's OpenAI/Anthropic servers only parsed the Kimi-K2 tool-call token format (
<|tool_calls_section_begin|>...). Qwen3.5/Qwen3.6 emit tool calls as qwen3_coder XML (<tool_call><function=NAME><parameter=PNAME>VALUE</parameter></function></tool_call>), so those calls were returned as plain text and never surfaced as structuredtool_calls. As a result agent frontends (qwen-code, OpenCode, Cline, etc.) could not drive tools against a Qwen3.x model served by ATOM — the model would request a tool, but nothing executed.This adds Qwen3 XML support alongside the existing Kimi format, auto-detected from the output, mirroring the
qwen3_coder/qwen3_xmlparsers in vLLM and SGLang.Changes
tool_parser.py— parse<tool_call>/<function=>/<parameter=>into OpenAItool_calls. Because the XML is typeless, parameter values are coerced to their declared JSON-Schema type using the request'stools(int/float/bool/null/object/array), otherwise left as strings. Both non-streaming and streaming are supported; for streaming, content is streamed normally and the<tool_call>block is buffered and parsed when complete — which sidesteps the partial-XML streaming bugs reported against vLLM/SGLang (e.g. invalid JSON for multiple<function=>per<tool_call>). The Kimi token path is unchanged.protocol.py— into_template_dict, deserializetool_calls[].function.arguments(a JSON string in OpenAI requests) into a mapping. Multi-turn chat templates that iteratetool_call.arguments.items()(Qwen, Hermes) otherwise raiseTypeError: Can only get item pairs from a mappingon the turn after a tool call, breaking agentic loops. This matches how vLLM/SGLang deserialize arguments before applying the chat template.serving_chat.py/api_server.py— thread the request'stoolsinto the parsers for type coercion. The new parameter defaults toNone, preserving existing behavior for callers that don't pass it.Verification
Tested end-to-end with qwen-code pointed at ATOM serving Qwen3.6-27B BF16 (OpenAI endpoint, streaming): the agent issues a
write_filetool call (file created with correct content) and a follow-up run-shell tool call, then reports the program's output — the full multi-turn loop completes with no500s. Before this change the same run produced no file (tool call leaked as text) and 500'd on the next turn. The Kimi-K2 path is unchanged and the newtoolsargument is optional.Reference
Format and type-coercion semantics follow Qwen's Apache-2.0
qwen3coder_tool_parser.pyand the vLLM/SGLangqwen3_coder/qwen3_xmlparsers.