fix(onboard): force chat completions API for vLLM and NIM-local providers#980
fix(onboard): force chat completions API for vLLM and NIM-local providers#980BenediktSchackenberg wants to merge 6 commits intoNVIDIA:mainfrom
Conversation
…ders vLLM's /v1/responses endpoint does not run the --tool-call-parser, so tool calls arrive as raw text in the response content instead of the structured tool_calls array. The probe picks openai-responses because vLLM accepts the request, but parsing only works on /v1/chat/completions. Override preferredInferenceApi to openai-completions after validation for both vllm and nim-local provider paths. Fixes NVIDIA#976
There was a problem hiding this comment.
Copilot wasn't able to review any files in this pull request.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR forces Changes
Sequence Diagram(s)sequenceDiagram
participant Onboard as Onboard script
participant Probe as Endpoint probe
participant Config as Sandbox config
participant Client as Agent/Client
participant vLLM as vLLM/NIM local
participant ToolParser as Tool-Call Parser
participant Tool as Tool/Executor
Onboard->>Probe: probe /v1/responses and /v1/models
Probe-->>Onboard: reports openai-responses available
Onboard->>Config: set preferredInferenceApi = "openai-completions" (override)
Client->>Config: send inference request
Config->>vLLM: forward to /v1/chat/completions
vLLM->>ToolParser: produce structured tool_call data
ToolParser->>Tool: execute tool call
Tool-->>Client: return tool result via chat response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
bin/lib/onboard.js (1)
1869-1871: Add a regression test for the responses-first probe path.Current coverage only proves that
getSandboxInferenceConfig()respects an explicit preferred API. It does not lock down the new behavior wheresetupNim()must returnopenai-completionsforvllmandnim-localeven when the probe reportsopenai-responsesfirst. A focused test here would make this fix much harder to regress.Also applies to: 1988-1991
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 1869 - 1871, Add a regression test that simulates the probe returning "openai-responses" first and verifies that setupNim() (and indirectly getSandboxInferenceConfig()) forces preferredInferenceApi to "openai-completions" for engines "vllm" and "nim-local"; specifically, mock the probe result to return openai-responses, call setupNim() for both engine types, and assert the returned/derived preferredInferenceApi equals "openai-completions" to lock down the responses-first probe path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard.js`:
- Around line 1869-1871: After forcing preferredInferenceApi =
"openai-completions" add an explicit message into the onboarding output so the
override is echoed to the user; locate the code that previously logged the
probed API (validateOpenAiLikeSelection()) and the onboarding message builder
that prints the selected provider, and append or update that onboarding text to
include that OpenClaw is forcing "openai-completions" (use the same variable
preferredInferenceApi). Do the same change for the other identical override site
later in the file where preferredInferenceApi is set (the second block that
mirrors this behavior) so both onboarding outputs reflect the forced override.
---
Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 1869-1871: Add a regression test that simulates the probe
returning "openai-responses" first and verifies that setupNim() (and indirectly
getSandboxInferenceConfig()) forces preferredInferenceApi to
"openai-completions" for engines "vllm" and "nim-local"; specifically, mock the
probe result to return openai-responses, call setupNim() for both engine types,
and assert the returned/derived preferredInferenceApi equals
"openai-completions" to lock down the responses-first probe path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0342752e-7456-4fa0-b3c8-550c9219f25d
📒 Files selected for processing (1)
bin/lib/onboard.js
Surface the API override during onboard so users see why responses API was not selected.
Verifies that setupNim() forces preferredInferenceApi to openai-completions for the vLLM provider path even when the probe detects openai-responses first. This locks down the fix for NVIDIA#976 so the tool-call-parser override cannot silently regress.
|
Added a regression test as suggested by CodeRabbit — it verifies that Would appreciate if a maintainer could trigger CI on this — first-time contributor fork so it's waiting on approval. Thanks! |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/onboard-selection.test.js (1)
1308-1405: Add a companion test fornim-localoverride parity.This PR objective covers both
vllmandnim-local, but the new coverage here only guardsvllm. A parallelnim-localcase would prevent silent regressions in the second forced-override path.♻️ Suggested follow-up (test skeleton)
+ it("forces openai-completions for nim-local even when probe detects openai-responses", () => { + // mirror the vLLM structure: + // - select nim-local provider + // - make probe return /v1/responses success + // - assert preferredInferenceApi === "openai-completions" + // - assert override log line is emitted + });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/onboard-selection.test.js` around lines 1308 - 1405, Add a parallel test in the same file that mirrors the vLLM test but exercises the nim-local forced-override path: replicate the test that writes a fake curl and script invoking setupNim, but change the interactive answer sequence (via credentials.prompt) to select the nim-local option, ensure runner.runCapture responds to model probes similarly, then assert payload.result.provider === "nim-local", payload.result.preferredInferenceApi === "openai-completions", and payload.result.model matches the detected nim-local model; also assert the captured console lines include the equivalent "Using existing ..." log for nim-local and the "tool-call-parser requires" warning. Use the same helpers (setupNim, credentials.prompt, runner.runCapture) and test scaffolding as the vLLM case to guarantee parity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/onboard-selection.test.js`:
- Around line 1308-1405: Add a parallel test in the same file that mirrors the
vLLM test but exercises the nim-local forced-override path: replicate the test
that writes a fake curl and script invoking setupNim, but change the interactive
answer sequence (via credentials.prompt) to select the nim-local option, ensure
runner.runCapture responds to model probes similarly, then assert
payload.result.provider === "nim-local", payload.result.preferredInferenceApi
=== "openai-completions", and payload.result.model matches the detected
nim-local model; also assert the captured console lines include the equivalent
"Using existing ..." log for nim-local and the "tool-call-parser requires"
warning. Use the same helpers (setupNim, credentials.prompt, runner.runCapture)
and test scaffolding as the vLLM case to guarantee parity.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f569ae45-9c39-46c7-b645-5e4c322a6f5d
📒 Files selected for processing (1)
test/onboard-selection.test.js
|
@cv Would you mind triggering CI on this one? Added a regression test for the override as well. Thanks! |
Companion test for the vLLM case: verifies that setupNim() forces openai-completions for the NIM-local path too, since NIM uses vLLM internally and has the same tool-call-parser limitation.
|
Added the companion NIM-local regression test as suggested by CodeRabbit — mocks the full NIM onboarding flow (listModels, pullNimImage, startNimContainerByName, waitForNimHealth) and verifies the same openai-completions override applies. All 16 selection tests pass. |
Summary
vLLM's
/v1/responsesendpoint does not run the--tool-call-parser, so tool calls arrive as raw text instead of populating the structuredtool_callsarray. During onboard,probeOpenAiLikeEndpoint()tries/v1/responsesfirst — since vLLM accepts it,preferredInferenceApigets set toopenai-responsesand baked into the sandbox config.Fix
Override
preferredInferenceApitoopenai-completionsafter probe validation for both thevllmandnim-localprovider paths inbin/lib/onboard.js. This routes inference through/v1/chat/completionswhere the tool-call-parser works.The probe still runs (validates the endpoint is reachable), we just ignore its API preference for these two providers.
Fixes #976
Summary by CodeRabbit