fix(sglang): support SGLang 0.5.11+ (version detection + refactored leak check) by RixinLiu · Pull Request #353 · ovg-project/kvcached

RixinLiu · 2026-06-03T11:55:07Z

Summary

Two general (non-AMD) compatibility defects in the SGLang integration, found while porting kvcached to AMD. Bug #2 affects SGLang 0.5.11+ (verified on the 0.5.12.post1 wheel); Bug #1 affects any build that doesn't expose sglang.__version__. The fix was additionally checked against 0.4.9 (the oldest version kvcached supports) and 0.5.10 (the last version before the break) to confirm no regression on the older, still-working layouts.

Bug #1 — version detection misses builds without a module-level `version`

VersionManager.detect_version() only reads module attributes (__version__, then version/VERSION/_version). Some SGLang builds (notably source builds) expose none of these on the sglang module, even though the installed-package metadata carries the version. When detection returns None, every SGLang patch bails and kvcached silently becomes a no-op:

[kvcached][WARNING] Could not detect version for sglang
[kvcached][WARNING] Failed to apply elastic_memory_pool / ...
[kvcached][WARNING] Failed to patch sglang: elastic_allocator, ...

Fix: fall back to importlib.metadata.version(library_name) when no module attribute is found.

Bug #2 — `scheduler_memory_leak` patch doesn't match SGLang's newer leak-check layout

kvcached maps physical KV pages lazily, so SGLang's static-pool invariant (total == available + in-use) doesn't hold for the KV pool; the scheduler_memory_leak patch exists to neutralize SGLang's leak detector for that pool. The old patch scanned for a single Scheduler method containing both "memory leak detected" and "token_to_kv_pool_allocator".

Through 0.5.10 that still worked (check_memory names the KV allocator). At 0.5.11+, SGLang routes the KV-pool leak through a generic _report_leak(pool_name, …) that names no pool — so no single method has both literals, the patch matches nothing, fails to apply, and SGLang's own detector crashes the scheduler:

[kvcached][WARNING] Failed to apply scheduler_memory_leak
ValueError: pool memory leak detected! [full] total=..., available=..., evictable=0 ...
Received sigquit from a child process.   # exit 137

Fix: select the Scheduler leak-raisers by source ("memory leak detected") but skip any check specific to req_to_token_pool — a pool kvcached does not manage, whose invariant still holds, and whose alarm must stay live so a genuine request-pool leak still surfaces. The KV/token-pool checks are kept (the old combined check names token_to_kv_pool_allocator; the new generic _report_leak names no pool and is only ever called for token pools). This covers the old single-method layout and the new split/generic layout without over-suppressing unrelated alarms.

Works across versions (verified)

Bug #2 only breaks on 0.5.11+. The two older versions below are regression checks, not bug repros — they confirm the new matcher still neutralizes the KV-pool check exactly like the original patch did:

SGLang	KV-cache leak check	request-pool check	result
0.4.9	`check_memory` (KV + request fused)	(same method)	KV check neutralized — matches original patch behavior
0.5.10	`check_memory` (KV)	separate `_check_req_pool`	KV check neutralized, request check left live
0.5.12	generic `_report_leak` (KV)	separate `_check_req_pool`	KV check neutralized, request check left live — fixes the crash the old patch couldn't

No regression on older versions (behavior matches the original patch where it worked); fixes 0.5.11+; and the request-pool alarm is preserved wherever the version's structure keeps it separate.

Manifestation

	exists in code?	manifests on 0.5.12 wheel?	manifests on no-`__version__` build?
Bug #1	✅ yes	❌ no (`__version__` present)	✅ yes
Bug #2	✅ yes	✅ yes (leak crash, exit 137)	✅ yes

Changes

kvcached/integration/version_utils.py — importlib.metadata fallback in detect_version().
kvcached/integration/sglang/patches.py — generalize the scheduler_memory_leak patch across SGLang's leak-check layouts, skipping the req_to_token_pool-specific check.

Commits:

support SGLang 0.5.11-0.5.12 (version detection + refactored leak check)
leave req_to_token_pool leak check intact

…eak check) Two general (non-AMD) compatibility defects in the SGLang integration, found while porting kvcached to AMD and verified on NVIDIA (SGLang 0.5.12.post1). - version_utils: detect_version() only read module attributes (__version__, version, ...). Source builds of SGLang expose none, so detection returned None and every SGLang patch silently bailed. Fall back to importlib.metadata.version(). - sglang/patches: the scheduler_memory_leak patch scanned for a single Scheduler method containing both "memory leak detected" and "token_to_kv_pool_allocator". SGLang 0.5.11+ moved the check into SchedulerRuntimeCheckerMixin and split it across _check_req_pool / _report_leak, so nothing matched and SGLang's leak detector crashed the scheduler. Wrap every Scheduler method whose source raises "memory leak detected" instead -- covers both the old single-method layout and the new split one. Verified on NVIDIA (SGLang 0.5.12.post1) and AMD MI300X: all 7 SGLang patches apply, kvcached engages, no crash, output md5 matches the no-kvcached baseline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The previous commit suppressed every Scheduler method whose source mentioned "memory leak detected". On SGLang 0.5.10+ that also wraps _check_req_pool, which guards req_to_token_pool -- a pool kvcached does not manage. Silencing it could hide a genuine request-pool leak. Skip any check specific to req_to_token_pool; keep the KV/token-pool checks (the old combined check names token_to_kv_pool_allocator; the new generic _report_leak names no pool and is only called for token pools). Verified against sglang 0.4.9 / 0.5.10 / 0.5.12: the KV-pool check is still neutralized, the request-pool check stays live, all 7 patches apply, no crash, output md5 matches the no-kvcached baseline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This pull request improves kvcached’s SGLang integration compatibility across multiple SGLang versions by (1) making version detection more robust when sglang.__version__ is missing and (2) refactoring the scheduler leak-check suppression logic to match the reorganized leak-check code paths introduced in SGLang 0.5.11+.

Changes:

Add an importlib.metadata.version() fallback in VersionManager.detect_version() when no module-level version attribute is available.
Generalize the scheduler_memory_leak patch to suppress the relevant leak-raiser methods across both pre-0.5.11 and 0.5.11+ SGLang layouts, while intentionally skipping request-pool-specific checks.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`kvcached/integration/version_utils.py`	Adds a distribution-metadata fallback for version detection when module attributes are absent.
`kvcached/integration/sglang/patches.py`	Refactors the scheduler leak-check patch to match newer SGLang leak-check structure while preserving request-pool leak detection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

RixinLiu and others added 2 commits June 3, 2026 07:49

Copilot AI review requested due to automatic review settings June 3, 2026 11:55

Copilot started reviewing on behalf of RixinLiu June 3, 2026 11:55 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

RixinLiu requested a review from jiarong0907 June 3, 2026 11:59

RixinLiu mentioned this pull request Jun 3, 2026

feat(AMD): support AMD GPUs (ROCm/HIP) #354

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sglang): support SGLang 0.5.11+ (version detection + refactored leak check)#353

fix(sglang): support SGLang 0.5.11+ (version detection + refactored leak check)#353
RixinLiu wants to merge 2 commits into
mainfrom
sglang-0512-compat

RixinLiu commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RixinLiu commented Jun 3, 2026

Summary

Bug #1 — version detection misses builds without a module-level __version__

Bug #2 — scheduler_memory_leak patch doesn't match SGLang's newer leak-check layout

Works across versions (verified)

Manifestation

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bug #1 — version detection misses builds without a module-level `version`

Bug #2 — `scheduler_memory_leak` patch doesn't match SGLang's newer leak-check layout