feat(proxy): raw-passthrough /v1/chat/completions — fixes Codex / OpenAI tool calls#7
Open
KillerQueen-Z wants to merge 2 commits into
Open
feat(proxy): raw-passthrough /v1/chat/completions — fixes Codex / OpenAI tool calls#7KillerQueen-Z wants to merge 2 commits into
KillerQueen-Z wants to merge 2 commits into
Conversation
… tool calls) /v1/chat/completions went through the SDK's typed chat_completion_stream, which crashes on streamed tool calls: the strict ToolCall schema rejects streaming argument-fragment frames, _aiter_sse_chunks falls back to model_construct (leaving choices as raw dicts), and _aiter_and_archive then reads `.delta` on a dict — `'dict' object has no attribute 'delta'`. Any OpenAI client doing tool calls (Codex with wire_api=chat, etc.) hit this. Generalize the verbatim passthrough already used for /v1/messages (_forward_anthropic -> _forward_passthrough, taking a headers arg) and route /v1/chat/completions through it too. The body is forwarded byte-for-byte with only the x402 signature added, so the SDK's streaming/parsing/archiving is no longer in the hot path and streamed tool_calls survive intact. Keeps the semaphore gating and real-upstream-status handling from the Anthropic path. Verified on Solana with real paid models: /v1/chat/completions streaming + tools now returns tool_calls with full arguments and finish_reason=tool_calls (no crash); /v1/messages unchanged. 84 tests pass.
Replacing the SDK-based /v1/chat/completions handler with a raw passthrough left _sse_event_stream (and its only-consumer helper _openai_error_event) with no callers. Remove them. The /v1/responses bridge keeps its own _responses_sse_stream + the shared _payment_error_* helpers, which are still used.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
/v1/chat/completionsstill went through the SDK's typedchat_completion_stream, which crashes on streamed tool calls:Root cause (confirmed reproduced on Solana with a real model):
ToolCallschema requiresid/function.name/arguments, but streaming tool-call argument-fragment frames (id/name absent, partial args) don't satisfy it._aiter_sse_chunkstherefore falls back toChatCompletionChunk.model_construct(...), which doesn't parse nested models →choicesstay as raw dicts._aiter_and_archivethen doeschoice.delta.contenton a dict → crash.Plain chat works (text frames validate fine); any tool call breaks. This hits every OpenAI client doing tool use — notably Codex with
wire_api=chat.Fix — extend the verbatim passthrough to
/v1/chat/completions#6 made
/v1/messagesa byte-for-byte x402-signed passthrough. This generalizes that helper (_forward_anthropic→_forward_passthrough, taking aheadersarg) and routes/v1/chat/completionsthrough it too.tool_callssurvive intact./v1/messages), Codex / OpenAI clients (/v1/chat/completions), all as pure signing passthroughs.This is the structural direction: the proxy is a thin x402 signer; the gateway owns the protocol. No edge translation, so SDK streaming bugs can't surface in the proxy.
Verification (real, end-to-end)
On Solana (
sol.blockrun.ai) with real paid models:/v1/chat/completionsstream + tools'dict' object has no attribute 'delta'tool_calls+ full args{"city":"Madrid"}+finish_reason=tool_calls/v1/messagesstream + toolspytest: 84 passed (1 pre-existing litellm-version canary deselected). Compile clean.Note on
/v1/responses(Codex Responses API)The gateway has no native
/v1/responses, so it can't be a verbatim passthrough; the existing bridge stays as-is. Recommended Codex config iswire_api = "chat"→ routes through this fixed/v1/chat/completions. A native gateway/v1/responseswould be the clean long-term answer forwire_api=responses.