Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,21 @@ uv pip install --pre vllm==0.10.1+gptoss \
vllm serve openai/gpt-oss-20b
```

In case if the above installation did not work, these will work for Online Inference
```
sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt update
sudo apt install python3.12 python3.12-venv python3.12-dev -y
python3.12 --version
python3.12 -m venv .oss
source .oss/bin/activate
pip install -U uv
uv pip install vllm==0.10.2 --torch-backend=auto
# uv pip install openai-harmony # This is optional for Online Serve but required for offline serve
# main copmmand to start the Online Inference Server
vllm serve openai/gpt-oss-20b --async-scheduling
```

[Learn more about how to use gpt-oss with vLLM.](https://cookbook.openai.com/articles/gpt-oss/run-vllm)

Offline Serve Code:
Expand Down Expand Up @@ -150,7 +165,7 @@ sampling = SamplingParams(
)

outputs = llm.generate(
prompt_token_ids=[prefill_ids], # batch of size 1
[TokensPrompt(prompt_token_ids=prefill_ids)],
sampling_params=sampling,
Comment on lines 167 to 169

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Import TokensPrompt for new vLLM API

The offline inference example now calls llm.generate([TokensPrompt(prompt_token_ids=prefill_ids)], …) but the snippet still only imports LLM and SamplingParams. Because TokensPrompt is not imported (e.g., from vllm.inputs import TokensPrompt), anyone copying this updated code will hit a NameError before reaching generation. Consider adding the import alongside the other vLLM imports so the example runs as advertised.

Useful? React with 👍 / 👎.

)

Expand Down