Skip to content

fix(onboard): force chat completions API for vLLM and NIM-local providers#980

Open
BenediktSchackenberg wants to merge 6 commits intoNVIDIA:mainfrom
BenediktSchackenberg:fix/vllm-tool-call-api
Open

fix(onboard): force chat completions API for vLLM and NIM-local providers#980
BenediktSchackenberg wants to merge 6 commits intoNVIDIA:mainfrom
BenediktSchackenberg:fix/vllm-tool-call-api

Conversation

@BenediktSchackenberg
Copy link
Copy Markdown
Contributor

@BenediktSchackenberg BenediktSchackenberg commented Mar 26, 2026

Summary

vLLM's /v1/responses endpoint does not run the --tool-call-parser, so tool calls arrive as raw text instead of populating the structured tool_calls array. During onboard, probeOpenAiLikeEndpoint() tries /v1/responses first — since vLLM accepts it, preferredInferenceApi gets set to openai-responses and baked into the sandbox config.

Fix

Override preferredInferenceApi to openai-completions after probe validation for both the vllm and nim-local provider paths in bin/lib/onboard.js. This routes inference through /v1/chat/completions where the tool-call-parser works.

The probe still runs (validates the endpoint is reachable), we just ignore its API preference for these two providers.

Fixes #976

Summary by CodeRabbit

  • Bug Fixes
    • Enforced routing of local inference and vLLM tool calls through the chat-completions endpoint, standardizing behavior and improving tool-call parsing and compatibility for local setups.
  • Tests
    • Added integration tests that verify the enforced inference API selection and expected logging to prevent regressions.

…ders

vLLM's /v1/responses endpoint does not run the --tool-call-parser,
so tool calls arrive as raw text in the response content instead of
the structured tool_calls array. The probe picks openai-responses
because vLLM accepts the request, but parsing only works on
/v1/chat/completions.

Override preferredInferenceApi to openai-completions after validation
for both vllm and nim-local provider paths.

Fixes NVIDIA#976
Copilot AI review requested due to automatic review settings March 26, 2026 19:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR forces preferredInferenceApi = "openai-completions" for the NIM-local and local vLLM onboarding paths in bin/lib/onboard.js, and adds tests verifying that onboarding probes reporting openai-responses are overridden so requests use /v1/chat/completions.

Changes

Cohort / File(s) Summary
Onboarding API Selection Override
bin/lib/onboard.js
In setupNim(gpu) and the local vLLM selection branch, override the probed preferredInferenceApi to "openai-completions" and log when the override occurs so sandboxed inference is routed to /v1/chat/completions.
Tests: onboarding selection & overrides
test/onboard-selection.test.js
Adds integration tests that simulate choosing local vLLM and NIM-local while mocked probes report openai-responses; asserts preferredInferenceApi is forced to openai-completions and checks for expected vLLM/NIM tooling log output.

Sequence Diagram(s)

sequenceDiagram
  participant Onboard as Onboard script
  participant Probe as Endpoint probe
  participant Config as Sandbox config
  participant Client as Agent/Client
  participant vLLM as vLLM/NIM local
  participant ToolParser as Tool-Call Parser
  participant Tool as Tool/Executor

  Onboard->>Probe: probe /v1/responses and /v1/models
  Probe-->>Onboard: reports openai-responses available
  Onboard->>Config: set preferredInferenceApi = "openai-completions" (override)
  Client->>Config: send inference request
  Config->>vLLM: forward to /v1/chat/completions
  vLLM->>ToolParser: produce structured tool_call data
  ToolParser->>Tool: execute tool call
  Tool-->>Client: return tool result via chat response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nudged the probe, gave responses a shove,
Switched to completions for the parser I love.
Tool calls now hop out tidy and bright,
Local vLLM dances, tools work just right. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: forcing chat completions API for vLLM and NIM-local providers, which is the core fix in the changeset.
Linked Issues check ✅ Passed The PR implementation fully addresses issue #976's coding requirements: overriding preferredInferenceApi to openai-completions in both vllm and nim-local provider paths, with comprehensive test coverage validating the override behavior.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #976: code modifications in onboard.js target vllm and nim-local API routing, and tests verify these specific overrides with no extraneous changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
bin/lib/onboard.js (1)

1869-1871: Add a regression test for the responses-first probe path.

Current coverage only proves that getSandboxInferenceConfig() respects an explicit preferred API. It does not lock down the new behavior where setupNim() must return openai-completions for vllm and nim-local even when the probe reports openai-responses first. A focused test here would make this fix much harder to regress.

Also applies to: 1988-1991

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 1869 - 1871, Add a regression test that
simulates the probe returning "openai-responses" first and verifies that
setupNim() (and indirectly getSandboxInferenceConfig()) forces
preferredInferenceApi to "openai-completions" for engines "vllm" and
"nim-local"; specifically, mock the probe result to return openai-responses,
call setupNim() for both engine types, and assert the returned/derived
preferredInferenceApi equals "openai-completions" to lock down the
responses-first probe path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 1869-1871: After forcing preferredInferenceApi =
"openai-completions" add an explicit message into the onboarding output so the
override is echoed to the user; locate the code that previously logged the
probed API (validateOpenAiLikeSelection()) and the onboarding message builder
that prints the selected provider, and append or update that onboarding text to
include that OpenClaw is forcing "openai-completions" (use the same variable
preferredInferenceApi). Do the same change for the other identical override site
later in the file where preferredInferenceApi is set (the second block that
mirrors this behavior) so both onboarding outputs reflect the forced override.

---

Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 1869-1871: Add a regression test that simulates the probe
returning "openai-responses" first and verifies that setupNim() (and indirectly
getSandboxInferenceConfig()) forces preferredInferenceApi to
"openai-completions" for engines "vllm" and "nim-local"; specifically, mock the
probe result to return openai-responses, call setupNim() for both engine types,
and assert the returned/derived preferredInferenceApi equals
"openai-completions" to lock down the responses-first probe path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0342752e-7456-4fa0-b3c8-550c9219f25d

📥 Commits

Reviewing files that changed from the base of the PR and between 6d44def and 2f6dfd0.

📒 Files selected for processing (1)
  • bin/lib/onboard.js

benedikt and others added 3 commits March 26, 2026 19:25
Surface the API override during onboard so users see why responses
API was not selected.
Verifies that setupNim() forces preferredInferenceApi to openai-completions
for the vLLM provider path even when the probe detects openai-responses first.
This locks down the fix for NVIDIA#976 so the tool-call-parser override cannot
silently regress.
@BenediktSchackenberg
Copy link
Copy Markdown
Contributor Author

Added a regression test as suggested by CodeRabbit — it verifies that setupNim() forces openai-completions for the vLLM path even when the probe detects openai-responses first. All 15 selection tests pass locally.

Would appreciate if a maintainer could trigger CI on this — first-time contributor fork so it's waiting on approval. Thanks!

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/onboard-selection.test.js (1)

1308-1405: Add a companion test for nim-local override parity.

This PR objective covers both vllm and nim-local, but the new coverage here only guards vllm. A parallel nim-local case would prevent silent regressions in the second forced-override path.

♻️ Suggested follow-up (test skeleton)
+ it("forces openai-completions for nim-local even when probe detects openai-responses", () => {
+   // mirror the vLLM structure:
+   // - select nim-local provider
+   // - make probe return /v1/responses success
+   // - assert preferredInferenceApi === "openai-completions"
+   // - assert override log line is emitted
+ });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/onboard-selection.test.js` around lines 1308 - 1405, Add a parallel test
in the same file that mirrors the vLLM test but exercises the nim-local
forced-override path: replicate the test that writes a fake curl and script
invoking setupNim, but change the interactive answer sequence (via
credentials.prompt) to select the nim-local option, ensure runner.runCapture
responds to model probes similarly, then assert payload.result.provider ===
"nim-local", payload.result.preferredInferenceApi === "openai-completions", and
payload.result.model matches the detected nim-local model; also assert the
captured console lines include the equivalent "Using existing ..." log for
nim-local and the "tool-call-parser requires" warning. Use the same helpers
(setupNim, credentials.prompt, runner.runCapture) and test scaffolding as the
vLLM case to guarantee parity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/onboard-selection.test.js`:
- Around line 1308-1405: Add a parallel test in the same file that mirrors the
vLLM test but exercises the nim-local forced-override path: replicate the test
that writes a fake curl and script invoking setupNim, but change the interactive
answer sequence (via credentials.prompt) to select the nim-local option, ensure
runner.runCapture responds to model probes similarly, then assert
payload.result.provider === "nim-local", payload.result.preferredInferenceApi
=== "openai-completions", and payload.result.model matches the detected
nim-local model; also assert the captured console lines include the equivalent
"Using existing ..." log for nim-local and the "tool-call-parser requires"
warning. Use the same helpers (setupNim, credentials.prompt, runner.runCapture)
and test scaffolding as the vLLM case to guarantee parity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f569ae45-9c39-46c7-b645-5e4c322a6f5d

📥 Commits

Reviewing files that changed from the base of the PR and between 861581e and 6c43ed6.

📒 Files selected for processing (1)
  • test/onboard-selection.test.js

@BenediktSchackenberg
Copy link
Copy Markdown
Contributor Author

@cv Would you mind triggering CI on this one? Added a regression test for the override as well. Thanks!

Companion test for the vLLM case: verifies that setupNim() forces
openai-completions for the NIM-local path too, since NIM uses vLLM
internally and has the same tool-call-parser limitation.
@BenediktSchackenberg
Copy link
Copy Markdown
Contributor Author

Added the companion NIM-local regression test as suggested by CodeRabbit — mocks the full NIM onboarding flow (listModels, pullNimImage, startNimContainerByName, waitForNimHealth) and verifies the same openai-completions override applies. All 16 selection tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tool calls broken with local vLLM provider — openai-responses bypasses tool-call-parser

4 participants