Fix: Resolve critical bugs in vLLM Online and Offline inference #209

hrithiksagar · 2025-10-07T10:31:44Z

Fix: Resolve critical bugs in vLLM Online and Offline inference

Online Inference:
Updated the installation instructions in requirements and README. The previous setup referenced an outdated prebuilt vLLM version. The new stable release changed the installation method, which is now correctly documented.
Offline Inference:
Addressed a breaking change in llm.generate(<parameters>) due to deprecation in vLLM ≥ 0.10.2.
Replaced the old input handling with the new TokensPrompt interface (from vllm.inputs import TokensPrompt) to ensure compatibility with the latest vLLM API.

Fix: Resolved critical bugs in vLLM Online and Offline inference. Online Inference: Updated installation instructions in the requirements and README. The previous setup referenced an outdated prebuilt vLLM version. The new stable release changed the installation method, which is now correctly documented. Offline Inference: Fixed a breaking change in llm.generate(<parameters>) due to deprecation in vLLM ≥ 0.10.2. Replaced the old input handling with the new TokensPrompt interface (from vllm.inputs import TokensPrompt) to ensure compatibility with the latest vLLM API.

hrithiksagar · 2025-10-07T10:32:11Z

@dkundel-openai Could you please look into this, it took quite a time and research to solve this bug.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-07T10:33:15Z

README.md

 outputs = llm.generate(
-    prompt_token_ids=[prefill_ids],   # batch of size 1
+    [TokensPrompt(prompt_token_ids=prefill_ids)],
    sampling_params=sampling,


Import TokensPrompt for new vLLM API

The offline inference example now calls llm.generate([TokensPrompt(prompt_token_ids=prefill_ids)], …) but the snippet still only imports LLM and SamplingParams. Because TokensPrompt is not imported (e.g., from vllm.inputs import TokensPrompt), anyone copying this updated code will hit a NameError before reaching generation. Consider adding the import alongside the other vLLM imports so the example runs as advertised.

Useful? React with 👍 / 👎.

chatgpt-codex-connector bot reviewed Oct 7, 2025

View reviewed changes

hrithiksagar changed the title ~~Update README.md~~ Fix: Resolve critical bugs in vLLM Online and Offline inference Oct 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Resolve critical bugs in vLLM Online and Offline inference #209

Fix: Resolve critical bugs in vLLM Online and Offline inference #209

Uh oh!

hrithiksagar commented Oct 7, 2025

Uh oh!

hrithiksagar commented Oct 7, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: Resolve critical bugs in vLLM Online and Offline inference #209

Are you sure you want to change the base?

Fix: Resolve critical bugs in vLLM Online and Offline inference #209

Uh oh!

Conversation

hrithiksagar commented Oct 7, 2025

Uh oh!

hrithiksagar commented Oct 7, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant