Skip to content

feat(proxy): raw-passthrough /v1/chat/completions — fixes Codex / OpenAI tool calls#7

Open
KillerQueen-Z wants to merge 2 commits into
mainfrom
feat/chat-completions-raw-passthrough
Open

feat(proxy): raw-passthrough /v1/chat/completions — fixes Codex / OpenAI tool calls#7
KillerQueen-Z wants to merge 2 commits into
mainfrom
feat/chat-completions-raw-passthrough

Conversation

@KillerQueen-Z

Copy link
Copy Markdown
Collaborator

Problem

/v1/chat/completions still went through the SDK's typed chat_completion_stream, which crashes on streamed tool calls:

File ".../blockrun_llm/solana_client.py", _aiter_and_archive
    if choice.delta.content:
AttributeError: 'dict' object has no attribute 'delta'

Root cause (confirmed reproduced on Solana with a real model):

  1. The SDK's ToolCall schema requires id / function.name / arguments, but streaming tool-call argument-fragment frames (id/name absent, partial args) don't satisfy it.
  2. _aiter_sse_chunks therefore falls back to ChatCompletionChunk.model_construct(...), which doesn't parse nested models → choices stay as raw dicts.
  3. _aiter_and_archive then does choice.delta.content on a dict → crash.

Plain chat works (text frames validate fine); any tool call breaks. This hits every OpenAI client doing tool use — notably Codex with wire_api=chat.

Fix — extend the verbatim passthrough to /v1/chat/completions

#6 made /v1/messages a byte-for-byte x402-signed passthrough. This generalizes that helper (_forward_anthropic_forward_passthrough, taking a headers arg) and routes /v1/chat/completions through it too.

  • Body forwarded byte-for-byte, only the x402 signature added → the SDK's streaming/parsing/archiving is no longer in the hot path, so the crash can't be reached and streamed tool_calls survive intact.
  • Reuses the merged Anthropic path's semaphore gating + real-upstream-status handling (no more unconditional 200 on upstream errors).
  • One sidecar now serves Claude Code (/v1/messages), Codex / OpenAI clients (/v1/chat/completions), all as pure signing passthroughs.

This is the structural direction: the proxy is a thin x402 signer; the gateway owns the protocol. No edge translation, so SDK streaming bugs can't surface in the proxy.

Verification (real, end-to-end)

On Solana (sol.blockrun.ai) with real paid models:

Endpoint Before After
/v1/chat/completions stream + tools 'dict' object has no attribute 'delta' tool_calls + full args {"city":"Madrid"} + finish_reason=tool_calls
/v1/messages stream + tools ✅ (unchanged)
non-stream chat

pytest: 84 passed (1 pre-existing litellm-version canary deselected). Compile clean.

Note on /v1/responses (Codex Responses API)

The gateway has no native /v1/responses, so it can't be a verbatim passthrough; the existing bridge stays as-is. Recommended Codex config is wire_api = "chat" → routes through this fixed /v1/chat/completions. A native gateway /v1/responses would be the clean long-term answer for wire_api=responses.

… tool calls)

/v1/chat/completions went through the SDK's typed chat_completion_stream, which
crashes on streamed tool calls: the strict ToolCall schema rejects streaming
argument-fragment frames, _aiter_sse_chunks falls back to model_construct
(leaving choices as raw dicts), and _aiter_and_archive then reads `.delta` on a
dict — `'dict' object has no attribute 'delta'`. Any OpenAI client doing tool
calls (Codex with wire_api=chat, etc.) hit this.

Generalize the verbatim passthrough already used for /v1/messages
(_forward_anthropic -> _forward_passthrough, taking a headers arg) and route
/v1/chat/completions through it too. The body is forwarded byte-for-byte with
only the x402 signature added, so the SDK's streaming/parsing/archiving is no
longer in the hot path and streamed tool_calls survive intact. Keeps the
semaphore gating and real-upstream-status handling from the Anthropic path.

Verified on Solana with real paid models: /v1/chat/completions streaming +
tools now returns tool_calls with full arguments and finish_reason=tool_calls
(no crash); /v1/messages unchanged. 84 tests pass.
Replacing the SDK-based /v1/chat/completions handler with a raw passthrough
left _sse_event_stream (and its only-consumer helper _openai_error_event) with
no callers. Remove them. The /v1/responses bridge keeps its own
_responses_sse_stream + the shared _payment_error_* helpers, which are still used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant