Skip to content

fix(sglang): support SGLang 0.5.11+ (version detection + refactored leak check)#353

Open
RixinLiu wants to merge 2 commits into
mainfrom
sglang-0512-compat
Open

fix(sglang): support SGLang 0.5.11+ (version detection + refactored leak check)#353
RixinLiu wants to merge 2 commits into
mainfrom
sglang-0512-compat

Conversation

@RixinLiu
Copy link
Copy Markdown
Collaborator

@RixinLiu RixinLiu commented Jun 3, 2026

Summary

Two general (non-AMD) compatibility defects in the SGLang integration, found while porting kvcached to AMD. Bug #2 affects SGLang 0.5.11+ (verified on the 0.5.12.post1 wheel); Bug #1 affects any build that doesn't expose sglang.__version__. The fix was additionally checked against 0.4.9 (the oldest version kvcached supports) and 0.5.10 (the last version before the break) to confirm no regression on the older, still-working layouts.

Bug #1 — version detection misses builds without a module-level __version__

VersionManager.detect_version() only reads module attributes (__version__, then version/VERSION/_version). Some SGLang builds (notably source builds) expose none of these on the sglang module, even though the installed-package metadata carries the version. When detection returns None, every SGLang patch bails and kvcached silently becomes a no-op:

[kvcached][WARNING] Could not detect version for sglang
[kvcached][WARNING] Failed to apply elastic_memory_pool / ...
[kvcached][WARNING] Failed to patch sglang: elastic_allocator, ...

Fix: fall back to importlib.metadata.version(library_name) when no module attribute is found.

Bug #2scheduler_memory_leak patch doesn't match SGLang's newer leak-check layout

kvcached maps physical KV pages lazily, so SGLang's static-pool invariant (total == available + in-use) doesn't hold for the KV pool; the scheduler_memory_leak patch exists to neutralize SGLang's leak detector for that pool. The old patch scanned for a single Scheduler method containing both "memory leak detected" and "token_to_kv_pool_allocator".

Through 0.5.10 that still worked (check_memory names the KV allocator). At 0.5.11+, SGLang routes the KV-pool leak through a generic _report_leak(pool_name, …) that names no pool — so no single method has both literals, the patch matches nothing, fails to apply, and SGLang's own detector crashes the scheduler:

[kvcached][WARNING] Failed to apply scheduler_memory_leak
ValueError: pool memory leak detected! [full] total=..., available=..., evictable=0 ...
Received sigquit from a child process.   # exit 137

Fix: select the Scheduler leak-raisers by source ("memory leak detected") but skip any check specific to req_to_token_pool — a pool kvcached does not manage, whose invariant still holds, and whose alarm must stay live so a genuine request-pool leak still surfaces. The KV/token-pool checks are kept (the old combined check names token_to_kv_pool_allocator; the new generic _report_leak names no pool and is only ever called for token pools). This covers the old single-method layout and the new split/generic layout without over-suppressing unrelated alarms.

Works across versions (verified)

Bug #2 only breaks on 0.5.11+. The two older versions below are regression checks, not bug repros — they confirm the new matcher still neutralizes the KV-pool check exactly like the original patch did:

SGLang KV-cache leak check request-pool check result
0.4.9 check_memory (KV + request fused) (same method) KV check neutralized — matches original patch behavior
0.5.10 check_memory (KV) separate _check_req_pool KV check neutralized, request check left live
0.5.12 generic _report_leak (KV) separate _check_req_pool KV check neutralized, request check left live — fixes the crash the old patch couldn't

No regression on older versions (behavior matches the original patch where it worked); fixes 0.5.11+; and the request-pool alarm is preserved wherever the version's structure keeps it separate.

Manifestation

exists in code? manifests on 0.5.12 wheel? manifests on no-__version__ build?
Bug #1 ✅ yes ❌ no (__version__ present) ✅ yes
Bug #2 ✅ yes ✅ yes (leak crash, exit 137) ✅ yes

Changes

  • kvcached/integration/version_utils.pyimportlib.metadata fallback in detect_version().
  • kvcached/integration/sglang/patches.py — generalize the scheduler_memory_leak patch across SGLang's leak-check layouts, skipping the req_to_token_pool-specific check.

Commits:

  • support SGLang 0.5.11-0.5.12 (version detection + refactored leak check)
  • leave req_to_token_pool leak check intact

RixinLiu and others added 2 commits June 3, 2026 07:49
…eak check)

Two general (non-AMD) compatibility defects in the SGLang integration,
found while porting kvcached to AMD and verified on NVIDIA (SGLang
0.5.12.post1).

- version_utils: detect_version() only read module attributes
  (__version__, version, ...). Source builds of SGLang expose none, so
  detection returned None and every SGLang patch silently bailed. Fall
  back to importlib.metadata.version().

- sglang/patches: the scheduler_memory_leak patch scanned for a single
  Scheduler method containing both "memory leak detected" and
  "token_to_kv_pool_allocator". SGLang 0.5.11+ moved the check into
  SchedulerRuntimeCheckerMixin and split it across _check_req_pool /
  _report_leak, so nothing matched and SGLang's leak detector crashed the
  scheduler. Wrap every Scheduler method whose source raises "memory leak
  detected" instead -- covers both the old single-method layout and the
  new split one.

Verified on NVIDIA (SGLang 0.5.12.post1) and AMD MI300X: all 7 SGLang
patches apply, kvcached engages, no crash, output md5 matches the
no-kvcached baseline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous commit suppressed every Scheduler method whose source
mentioned "memory leak detected". On SGLang 0.5.10+ that also wraps
_check_req_pool, which guards req_to_token_pool -- a pool kvcached does
not manage. Silencing it could hide a genuine request-pool leak.

Skip any check specific to req_to_token_pool; keep the KV/token-pool
checks (the old combined check names token_to_kv_pool_allocator; the new
generic _report_leak names no pool and is only called for token pools).

Verified against sglang 0.4.9 / 0.5.10 / 0.5.12: the KV-pool check is
still neutralized, the request-pool check stays live, all 7 patches
apply, no crash, output md5 matches the no-kvcached baseline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 3, 2026 11:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request improves kvcached’s SGLang integration compatibility across multiple SGLang versions by (1) making version detection more robust when sglang.__version__ is missing and (2) refactoring the scheduler leak-check suppression logic to match the reorganized leak-check code paths introduced in SGLang 0.5.11+.

Changes:

  • Add an importlib.metadata.version() fallback in VersionManager.detect_version() when no module-level version attribute is available.
  • Generalize the scheduler_memory_leak patch to suppress the relevant leak-raiser methods across both pre-0.5.11 and 0.5.11+ SGLang layouts, while intentionally skipping request-pool-specific checks.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
kvcached/integration/version_utils.py Adds a distribution-metadata fallback for version detection when module attributes are absent.
kvcached/integration/sglang/patches.py Refactors the scheduler leak-check patch to match newer SGLang leak-check structure while preserving request-pool leak detection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants