-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Open
Labels
usageHow to use vllmHow to use vllm
Description
Your current environment
pytorch 2.7.1, vLLM 0.10.1.1, H20 * 8
How would you like to use vllm
I'm currently using Qwen2.5-VL 32B to do some multi-modal inferencing. From the result of torch profiler, i see the vision encoder takes a "long" time in the prefill stage. I'd like to know if there is any way to increase the performance of the vision encoder (like compilation etc.). Thanks!
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
usageHow to use vllmHow to use vllm