Fix vLLM queue overflow with serialized semaphore release #316

lgibelli · 2025-08-19T16:46:00Z

Problem

When processing large batches of PDFs, multiple workers could acquire the semaphore in rapid succession, causing bursts of 1000+ page submissions that led to vLLM crashes (queue depth 600+, KV cache 97-99%).

Solution

Add asyncio.Lock() to serialize semaphore release checks. This ensures atomic evaluation of conditions and prevents race conditions where multiple workers start simultaneously.

Changes

Added release_lock to make release decisions atomic
All condition checks now happen inside the lock
Preserves original backpressure logic (10% of peak, 30s cooldown)

Testing

Tested with 150 PDFs - queue stayed under 460 (previously crashed at 600+) and no double releases occurred.

jakep-allenai · 2025-08-19T16:56:06Z

Hmm, what GPU were you testing with? On an H100, you can do 600 pages at a time and it's quite good. How much RAM is on your machine?
And did you try setting --pages_per_group to a smaller number?

Also, did you try with v0.3.3? Because I had fixed a root cause issue of VLLM using too many worker threads

lgibelli · 2025-08-19T18:22:39Z

I'm testing on a 4090 (24GB of VRAM) on a system with 20 GB of DDR4.
The reason why lowering --pages_per_groups doesn't help is that each worker submits all pages at once and the vLLM queue still overflows. On a 4090 the vLLM processes maybe 10-20 pages per second, and even with a limit of 100 pages per worker the queue builds up faster than vLLM can drain it and each queued request uses GPU KV cache memory.

What I see happening (after adding some debug code) with pages_per_group=100 looks more or less like this:
Worker 1: Submits 100 pages → Queue: 100
Worker 2: Submits 100 pages → Queue: 200
Worker 3: Submits 100 pages → Queue: 300
Worker 4: Submits 100 pages → Queue: 400
Worker 5: Submits 100 pages → Queue: 500
vLLM processes 20 pages... → Queue: 480
Workers keep submitting... → Queue: 600+
-> crash

Yes, I did use v0.3.3.

jakep-allenai · 2025-08-19T19:47:37Z

Weird, the idea is that it should not submit more until the queue goes down a bit more. I feel like there is some simpler way to fix this, like don't unlock the semaphore until at least one page processes in the previous worker.

lgibelli · 2025-08-19T21:47:34Z

The only real advantage of my approach is smoother queue depth. With your approach the queue would jump between 50→550→50→550.
I will try your approach tomorrow and report back if it is stable on the 4090.

Multiple workers could acquire semaphore in rapid succession when queue dropped, causing bursts of 1000+ page submissions and vLLM crashes. Race condition in semaphore release logic - multiple threads could evaluate conditions and release simultaneously before queue updated. Add asyncio.Lock() to serialize release checks, ensuring atomic evaluation and release. All condition checks now happen inside the lock.

lgibelli · 2025-08-27T11:56:17Z

Tested locally on my 4090.

lgibelli mentioned this pull request Aug 19, 2025

vLLM queue overflow from burst page submissions #317

Open

lgibelli marked this pull request as draft August 27, 2025 09:04

lgibelli force-pushed the fix-vllm-backpressure branch from b31376d to 0c74322 Compare August 27, 2025 11:39

lgibelli changed the title ~~Fix vLLM queue overflow from burst page submissions~~ Fix vLLM queue overflow with serialized semaphore release Aug 27, 2025

lgibelli marked this pull request as ready for review August 27, 2025 11:41

lgibelli force-pushed the fix-vllm-backpressure branch from 0c74322 to 0742014 Compare August 27, 2025 11:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix vLLM queue overflow with serialized semaphore release #316

Fix vLLM queue overflow with serialized semaphore release #316

Uh oh!

lgibelli commented Aug 19, 2025 •

edited

Loading

Uh oh!

jakep-allenai commented Aug 19, 2025 •

edited

Loading

Uh oh!

lgibelli commented Aug 19, 2025 •

edited

Loading

Uh oh!

jakep-allenai commented Aug 19, 2025

Uh oh!

lgibelli commented Aug 19, 2025

Uh oh!

lgibelli commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix vLLM queue overflow with serialized semaphore release #316

Are you sure you want to change the base?

Fix vLLM queue overflow with serialized semaphore release #316

Uh oh!

Conversation

lgibelli commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Testing

Uh oh!

jakep-allenai commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lgibelli commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakep-allenai commented Aug 19, 2025

Uh oh!

lgibelli commented Aug 19, 2025

Uh oh!

lgibelli commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lgibelli commented Aug 19, 2025 •

edited

Loading

jakep-allenai commented Aug 19, 2025 •

edited

Loading

lgibelli commented Aug 19, 2025 •

edited

Loading