Skip to content

Conversation

hrithiksagar
Copy link

Fix: Resolve critical bugs in vLLM Online and Offline inference

  • Online Inference:
    Updated the installation instructions in requirements and README. The previous setup referenced an outdated prebuilt vLLM version. The new stable release changed the installation method, which is now correctly documented.

  • Offline Inference:
    Addressed a breaking change in llm.generate(<parameters>) due to deprecation in vLLM ≥ 0.10.2.
    Replaced the old input handling with the new TokensPrompt interface (from vllm.inputs import TokensPrompt) to ensure compatibility with the latest vLLM API.

Fix: Resolved critical bugs in vLLM Online and Offline inference.

Online Inference: Updated installation instructions in the requirements and README. The previous setup referenced an outdated prebuilt vLLM version. The new stable release changed the installation method, which is now correctly documented.

Offline Inference: Fixed a breaking change in llm.generate(<parameters>) due to deprecation in vLLM ≥ 0.10.2. Replaced the old input handling with the new TokensPrompt interface (from vllm.inputs import TokensPrompt) to ensure compatibility with the latest vLLM API.
@hrithiksagar
Copy link
Author

@dkundel-openai Could you please look into this, it took quite a time and research to solve this bug.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines 167 to 169
outputs = llm.generate(
prompt_token_ids=[prefill_ids], # batch of size 1
[TokensPrompt(prompt_token_ids=prefill_ids)],
sampling_params=sampling,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Import TokensPrompt for new vLLM API

The offline inference example now calls llm.generate([TokensPrompt(prompt_token_ids=prefill_ids)], …) but the snippet still only imports LLM and SamplingParams. Because TokensPrompt is not imported (e.g., from vllm.inputs import TokensPrompt), anyone copying this updated code will hit a NameError before reaching generation. Consider adding the import alongside the other vLLM imports so the example runs as advertised.

Useful? React with 👍 / 👎.

@hrithiksagar hrithiksagar changed the title Update README.md Fix: Resolve critical bugs in vLLM Online and Offline inference Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant