Skip to content

Conversation

ywang96
Copy link
Member

@ywang96 ywang96 commented Oct 1, 2025

Purpose

Since this model is performing pretty well on text only tasks we might want to allow people to serve it as a text-only model.

Test Plan

Test Result

Running vllm serve Qwen/Qwen3-VL-235B-A22B-Instruct --limit-mm-per-prompt.image 0 --limit-mm-per-prompt.video 0 --load-format dummy -tp 8 shows the following in the logs

(APIServer pid=6540) INFO 10-01 01:53:40 [registry.py:117] All limits of multimodal modalities supported by the model are set to 0, running in text-only mode.

Confirm vision model weights are not loaded:
Without setting limit:

(Worker_TP4 pid=7041) INFO 10-01 01:54:18 [gpu_model_runner.py:2758] Model loading took 55.4919 GiB and 0.376107 seconds

Without setting limit + DP ViT:

(Worker_TP3 pid=21254) INFO 10-01 02:11:51 [gpu_model_runner.py:2758] Model loading took 56.4331 GiB and 0.335513 seconds

Setting all limits to 0:

(Worker_TP6 pid=17841) INFO 10-01 02:09:06 [gpu_model_runner.py:2758] Model loading took 55.1608 GiB and 0.320821 seconds

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
@ywang96 ywang96 requested a review from sighingnow as a code owner October 1, 2025 01:24
@ywang96 ywang96 changed the title [MM] Add text-only model for Qwen3-VL [MM] Add text-only mode for Qwen3-VL Oct 1, 2025
@mergify mergify bot added the qwen Related to Qwen models label Oct 1, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to add text-only support for Qwen3-VL by conditionally initializing the visual components. The changes correctly identify the parts of the code that need to be conditional (Qwen3_VisionTransformer initialization, deepstack_input_embeds initialization, and weight loading). However, there is a critical logic error in the condition for initializing the visual model, which inverts the intended behavior. This would cause the model to initialize visual components in text-only mode and skip them in multimodal mode. I've provided a suggestion to fix this. The other related changes are correct, assuming this primary logic is fixed.

ywang96 and others added 2 commits September 30, 2025 18:26
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 1, 2025
@simon-mo simon-mo enabled auto-merge (squash) October 1, 2025 02:16
@simon-mo simon-mo added this to the v0.11.0 Cherry Picks milestone Oct 1, 2025
@simon-mo simon-mo merged commit 66bca9b into vllm-project:main Oct 1, 2025
54 of 56 checks passed
simon-mo pushed a commit that referenced this pull request Oct 1, 2025
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants