Add chunked prefill and limit mm per prompt options #341

charitarthchugh · 2025-09-25T18:43:41Z

The goal of this PR is to supplement the defaults to increase inference speed. I have tested this on a L40S and this has been working fine, with more than 2 weeks of combined uptime running pdf conversion.

Changes proposed in this pull request:

--limit-mm-per-prompt allows VLLM to not allocate space on the GPU for the video encoder, thus increasing space for KV cache
--enable-chunked-prefill basically streams the input tokens allowing for more efficient memory use.

Before submitting

I've read and followed all steps in the Making a pull request
section of the CONTRIBUTING docs.
If this PR fixes a bug, I've added a test that will fail without my fix.
If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

jakep-allenai · 2025-10-06T18:32:26Z

Sorry, been sick, just getting to this now. Let me run it locally and see what the numbers are looking like.

jakep-allenai · 2025-10-06T18:50:34Z

--limit-mm-per-prompt sounds like a great idea, but it seems like VLLM V1 is going to have chunked prefill always on by default. I'm waiting for benchmarks to run on our cluster here

jakep-allenai · 2025-10-06T20:23:44Z

Ok nice find on disabling the video encoder, really makes a difference on smaller GPU cards! There's a small syntax error, but I'm going to merge this in, as I want to push a PR today that moves things over to VLLM 0.11 officially as well.

Add chunked prefill and limit mm per prompt options

fe425fd

jakep-allenai merged commit 2b70b50 into allenai:main Oct 6, 2025
8 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add chunked prefill and limit mm per prompt options #341

Add chunked prefill and limit mm per prompt options #341

Uh oh!

charitarthchugh commented Sep 25, 2025 •

edited

Loading

Uh oh!

jakep-allenai commented Oct 6, 2025

Uh oh!

jakep-allenai commented Oct 6, 2025

Uh oh!

jakep-allenai commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

Add chunked prefill and limit mm per prompt options #341

Add chunked prefill and limit mm per prompt options #341

Uh oh!

Conversation

charitarthchugh commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

Uh oh!

jakep-allenai commented Oct 6, 2025

Uh oh!

jakep-allenai commented Oct 6, 2025

Uh oh!

jakep-allenai commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

charitarthchugh commented Sep 25, 2025 •

edited

Loading