Update vllm.py to support the V1 engine #53

ClementeP · 2025-10-27T20:38:52Z

I updated vllm.py to support the V1 engine. By default, the V0 engine is still used, unless FORCE_V0 is set to False and the V1 engine is available on the current version (envs.VLLM_USE_V1 is True). Since accessing the logprobs is slow in V1, we only retrieve the logprobs for the most probable tokens; the exact number of retrieved tokens can be controlled with LOGPROBS_PER_REQUEST (set by default to 256). The remaining probability mass is distributed among the remaining tokens.

IMPORTANT NOTE: in order to support the gpt-oss mxfp4 quantization ( vllm has worked out a way to make it work on the A100, and currently mxfp4 is the only supported quantization ), I had to update the dependencies on "pyproject.toml"
to allow vllm 0.10.2. However the current V0 implementation does not support vllm > 0.10.0 (the "disable_log_requests" option is not supported), which means that to use the V0 machine you should have vllm <= 0.10.0.

…hch is necessary for gpt-oss.

DRMacIver

As well as the comments I've added about random details, I'm afraid I'm not keen on this the way you've added this.

I would like to see some tests demonstrating that this works, which also requires that the code actually be runnable (which it currently isn't).

In order to do that I think we need to be able to run both V0 and V1 in the same process (unless there's some reason why we can't do that? In which case we might need different testing environments for different versions of vllm. If that's the case, let's talk about how we might set that up), which requires not having this hard coded at import time, e.g. by providing a flag to the constructor for AsyncVirtualLM.

DRMacIver · 2025-10-29T11:16:48Z

notes/Untitled.ipynb

What's this doing here?

DRMacIver · 2025-10-29T11:17:00Z

notes/playground.ipynb

Same question. This seems entirely empty.

DRMacIver · 2025-10-29T11:17:54Z

genlm/backend/llm/vllm.py

@@ -1,3 +1,7 @@
+FORCE_V0 = True #Currently, we force thw model to use V0, to switch to V1 simply set this to False


Nitpicky typo comments: thw. Also conventionally there's a space after the #

More importantly... I'm not thrilled about this hardcoded constant where you have to change the source code for any of the code you've added to be reachable.

DRMacIver · 2025-10-29T11:18:19Z

genlm/backend/llm/vllm.py

@@ -1,3 +1,7 @@
+FORCE_V0 = True #Currently, we force thw model to use V0, to switch to V1 simply set this to False
+LOGPROBS_PER_REQUEST = 256 #These are th elogprobs that are retrieved currently in V1


Same comment as above RE #, also th e.

DRMacIver · 2025-10-29T11:19:01Z

genlm/backend/llm/hf.py

        return cls(mod, tok, **kwargs)

+    # @classmethod
+    # def from_name(cls, model_id, bitsandbytes_opts=None, hf_opts=None, **kwargs):


What's with all this commented out code?

…select either V1 or V0 by passing a flag to the constructor.

ClementeP · 2025-11-03T09:45:07Z

I added some tests, and also changed the structure: now we can switch between `v0 and V1 by passing a variable to the constructor.

codecov · 2025-11-03T14:29:33Z

Codecov Report

❌ Patch coverage is 97.22222% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
genlm/backend/llm/vllm.py	97.22%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Merge with Ben's merge fix on pyproject.toml

… input to next_token_logprobs in vllm.py (there is no need to consider the case where we pass a string as input)

ClementeP · 2025-11-11T21:06:47Z

@DRMacIver the issues should be fixed now

ClementeP added 4 commits October 10, 2025 10:08

quantization update

c4cd821

V1 backend

9e163ff

Polishing V1

b446b35

Updates to support V1. updated pyproject.toml to support vllm 10.2, w…

df8f789

…hch is necessary for gpt-oss.

ClementeP requested a review from DRMacIver October 27, 2025 20:38

DRMacIver requested changes Oct 29, 2025

View reviewed changes

added test cases for V1, and modified the vllm class so that we can …

2adf75b

…select either V1 or V0 by passing a flag to the constructor.

ClementeP marked this pull request as draft November 3, 2025 09:33

DRMacIver changed the title ~~Clemente~~ Update vllm.py to support the V1 engine Nov 3, 2025

DRMacIver marked this pull request as ready for review November 3, 2025 11:23

Merge branch 'main' into clemente

b38b7d2

ClementeP added 8 commits November 3, 2025 18:37

ruff tests and set back the dependencies to <=0.10.0 to avoid conflicts

f3f920a

Merge branch 'clemente' of github.com:genlm/genlm-backend into clemente

45f9896

Merge with Ben's merge fix on pyproject.toml

addeds skipo condition in the case that V1 is not supported

608d960

added V1 patches

a44c427

add V1 patches and fixing the formatting

ab05c3d

fix coverage patch

40bf62e

fix code patches

132adc1

adjusted comments format, removed commented out code, simplified the…

918d5aa

… input to next_token_logprobs in vllm.py (there is no need to consider the case where we pass a string as input)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update vllm.py to support the V1 engine #53

Update vllm.py to support the V1 engine #53

Uh oh!

ClementeP commented Oct 27, 2025

Uh oh!

DRMacIver left a comment

Uh oh!

DRMacIver Oct 29, 2025

Uh oh!

DRMacIver Oct 29, 2025

Uh oh!

DRMacIver Oct 29, 2025

Uh oh!

DRMacIver Oct 29, 2025

Uh oh!

DRMacIver Oct 29, 2025

Uh oh!

DRMacIver Oct 29, 2025

Uh oh!

ClementeP commented Nov 3, 2025

Uh oh!

codecov bot commented Nov 3, 2025 •

edited

Loading

Uh oh!

ClementeP commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -1,3 +1,7 @@
		FORCE_V0 = True #Currently, we force thw model to use V0, to switch to V1 simply set this to False

		@@ -1,3 +1,7 @@
		FORCE_V0 = True #Currently, we force thw model to use V0, to switch to V1 simply set this to False
		LOGPROBS_PER_REQUEST = 256 #These are th elogprobs that are retrieved currently in V1

Update vllm.py to support the V1 engine #53

Are you sure you want to change the base?

Update vllm.py to support the V1 engine #53

Uh oh!

Conversation

ClementeP commented Oct 27, 2025

Uh oh!

DRMacIver left a comment

Choose a reason for hiding this comment

Uh oh!

DRMacIver Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

DRMacIver Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

DRMacIver Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

DRMacIver Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

DRMacIver Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

DRMacIver Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

ClementeP commented Nov 3, 2025

Uh oh!

codecov bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ClementeP commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Nov 3, 2025 •

edited

Loading