fix(sglang): support SGLang 0.5.11+ (version detection + refactored leak check)#353
Open
RixinLiu wants to merge 2 commits into
Open
fix(sglang): support SGLang 0.5.11+ (version detection + refactored leak check)#353RixinLiu wants to merge 2 commits into
RixinLiu wants to merge 2 commits into
Conversation
…eak check) Two general (non-AMD) compatibility defects in the SGLang integration, found while porting kvcached to AMD and verified on NVIDIA (SGLang 0.5.12.post1). - version_utils: detect_version() only read module attributes (__version__, version, ...). Source builds of SGLang expose none, so detection returned None and every SGLang patch silently bailed. Fall back to importlib.metadata.version(). - sglang/patches: the scheduler_memory_leak patch scanned for a single Scheduler method containing both "memory leak detected" and "token_to_kv_pool_allocator". SGLang 0.5.11+ moved the check into SchedulerRuntimeCheckerMixin and split it across _check_req_pool / _report_leak, so nothing matched and SGLang's leak detector crashed the scheduler. Wrap every Scheduler method whose source raises "memory leak detected" instead -- covers both the old single-method layout and the new split one. Verified on NVIDIA (SGLang 0.5.12.post1) and AMD MI300X: all 7 SGLang patches apply, kvcached engages, no crash, output md5 matches the no-kvcached baseline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous commit suppressed every Scheduler method whose source mentioned "memory leak detected". On SGLang 0.5.10+ that also wraps _check_req_pool, which guards req_to_token_pool -- a pool kvcached does not manage. Silencing it could hide a genuine request-pool leak. Skip any check specific to req_to_token_pool; keep the KV/token-pool checks (the old combined check names token_to_kv_pool_allocator; the new generic _report_leak names no pool and is only called for token pools). Verified against sglang 0.4.9 / 0.5.10 / 0.5.12: the KV-pool check is still neutralized, the request-pool check stays live, all 7 patches apply, no crash, output md5 matches the no-kvcached baseline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request improves kvcached’s SGLang integration compatibility across multiple SGLang versions by (1) making version detection more robust when sglang.__version__ is missing and (2) refactoring the scheduler leak-check suppression logic to match the reorganized leak-check code paths introduced in SGLang 0.5.11+.
Changes:
- Add an
importlib.metadata.version()fallback inVersionManager.detect_version()when no module-level version attribute is available. - Generalize the
scheduler_memory_leakpatch to suppress the relevant leak-raiser methods across both pre-0.5.11 and 0.5.11+ SGLang layouts, while intentionally skipping request-pool-specific checks.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
kvcached/integration/version_utils.py |
Adds a distribution-metadata fallback for version detection when module attributes are absent. |
kvcached/integration/sglang/patches.py |
Refactors the scheduler leak-check patch to match newer SGLang leak-check structure while preserving request-pool leak detection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two general (non-AMD) compatibility defects in the SGLang integration, found while porting kvcached to AMD. Bug #2 affects SGLang 0.5.11+ (verified on the
0.5.12.post1wheel); Bug #1 affects any build that doesn't exposesglang.__version__. The fix was additionally checked against0.4.9(the oldest version kvcached supports) and0.5.10(the last version before the break) to confirm no regression on the older, still-working layouts.Bug #1 — version detection misses builds without a module-level
__version__VersionManager.detect_version()only reads module attributes (__version__, thenversion/VERSION/_version). Some SGLang builds (notably source builds) expose none of these on thesglangmodule, even though the installed-package metadata carries the version. When detection returnsNone, every SGLang patch bails and kvcached silently becomes a no-op:Fix: fall back to
importlib.metadata.version(library_name)when no module attribute is found.Bug #2 —
scheduler_memory_leakpatch doesn't match SGLang's newer leak-check layoutkvcached maps physical KV pages lazily, so SGLang's static-pool invariant (
total == available + in-use) doesn't hold for the KV pool; thescheduler_memory_leakpatch exists to neutralize SGLang's leak detector for that pool. The old patch scanned for a singleSchedulermethod containing both"memory leak detected"and"token_to_kv_pool_allocator".Through 0.5.10 that still worked (
check_memorynames the KV allocator). At 0.5.11+, SGLang routes the KV-pool leak through a generic_report_leak(pool_name, …)that names no pool — so no single method has both literals, the patch matches nothing, fails to apply, and SGLang's own detector crashes the scheduler:Fix: select the Scheduler leak-raisers by source (
"memory leak detected") but skip any check specific toreq_to_token_pool— a pool kvcached does not manage, whose invariant still holds, and whose alarm must stay live so a genuine request-pool leak still surfaces. The KV/token-pool checks are kept (the old combined check namestoken_to_kv_pool_allocator; the new generic_report_leaknames no pool and is only ever called for token pools). This covers the old single-method layout and the new split/generic layout without over-suppressing unrelated alarms.Works across versions (verified)
Bug #2 only breaks on 0.5.11+. The two older versions below are regression checks, not bug repros — they confirm the new matcher still neutralizes the KV-pool check exactly like the original patch did:
check_memory(KV + request fused)check_memory(KV)_check_req_pool_report_leak(KV)_check_req_poolNo regression on older versions (behavior matches the original patch where it worked); fixes 0.5.11+; and the request-pool alarm is preserved wherever the version's structure keeps it separate.
Manifestation
__version__build?__version__present)Changes
kvcached/integration/version_utils.py—importlib.metadatafallback indetect_version().kvcached/integration/sglang/patches.py— generalize thescheduler_memory_leakpatch across SGLang's leak-check layouts, skipping thereq_to_token_pool-specific check.Commits:
support SGLang 0.5.11-0.5.12 (version detection + refactored leak check)leave req_to_token_pool leak check intact