Skip to content

Autopatch fails when ENABLE_KVCACHED is set inside a Python script (requires explicit from kvcached import autopatch) #320

@ztang2370

Description

@ztang2370

Summary

When kvcached is used from a custom Python script, users must add from kvcached import autopatch before import vllm / import sglang for the patches to take effect. Setting os.environ["ENABLE_KVCACHED"]="1" inside the script does not work, even though the documentation suggests this env var is the toggle. This is confusing and forces a kvcached-specific source-level import in user code.

Related to issue #316.

Reproduction

import os
os.environ["ENABLE_KVCACHED"] = "1"
os.environ["KVCACHED_AUTOPATCH"] = "1"

from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.usage.usage_lib import UsageContext
from vllm.v1.engine.async_llm import AsyncLLM

engine_args = AsyncEngineArgs(
    model="tencent/HunyuanOCR",
    trust_remote_code=True,
    gpu_memory_utilization=0.7,
    max_model_len=4096,
    enable_prefix_caching=False,
    max_num_batched_tokens=8192,
    mm_processor_cache_gb=0,
)
vllm_config = engine_args.create_engine_config(usage_context=UsageContext.OPENAI_API_SERVER)

async_llm = AsyncLLM.from_vllm_config(
    vllm_config=vllm_config,
    usage_context=UsageContext.OPENAI_API_SERVER,
    stat_loggers=None,
    enable_log_requests=engine_args.enable_log_requests,
    aggregate_engine_logging=engine_args.aggregate_engine_logging,
    disable_log_stats=engine_args.disable_log_stats,
)

Expected: vllm is patched by kvcached.
Actual: vllm runs unpatched. Patching only happens if either:

  1. ENABLE_KVCACHED=1 is exported in the shell before launching Python, or
  2. from kvcached import autopatch is added to the script before import vllm.

Root cause

The autopatch entry point is kvcached_autopatch.pth:

# kvcached_autopatch.pth
import os, importlib, importlib.util; (
    os.environ.setdefault("KVCACHED_AUTOPATCH", "1"),
    getattr(importlib.import_module("kvcached.autopatch"), "autopatch_all", lambda: None)()
) if os.getenv("ENABLE_KVCACHED", "false").lower() in ("true", "1")
  and importlib.util.find_spec("kvcached.autopatch") is not None else None

Python processes .pth files at interpreter startup, before any user code runs. So:

Shell-exported ENABLE_KVCACHED=1 → .pth sees it → calls autopatch_all() → registers @when_imported("vllm") / @when_imported("sglang") hooks → patches apply when the user imports vllm/sglang. ✅

os.environ["ENABLE_KVCACHED"]="1" set inside the script → executes after the .pth already short-circuited → autopatch_all() was never called → no when_imported hooks were registered → import vllm triggers nothing. ❌

KVCACHED_AUTOPATCH set inside the script is read later by _env_enabled() in kvcached/integration/vllm/autopatch.py, but it is consulted only by hooks that were never registered — so it has no effect on its own.

Proposed fix

Decouple hook registration (must happen at interpreter startup) from the enable check (should happen at vllm/sglang-import time, so env vars set inside the script are honored).

kvcached_autopatch.pth: always register hooks; drop the ENABLE_KVCACHED gate.

import importlib, importlib.util; importlib.import_module("kvcached.autopatch").autopatch_all() if importlib.util.find_spec("kvcached.autopatch") is not None else None

kvcached/integration/vllm/autopatch.py:_env_enabled and kvcached/integration/sglang/autopatch.py:_env_enabled accept either env var, so ENABLE_KVCACHED works as documented.

def _env_enabled() -> bool:
    return (
        os.getenv("ENABLE_KVCACHED", "false").lower() in ("true", "1")
        or os.getenv("KVCACHED_AUTOPATCH", "false").lower() in ("true", "1")
    )

After this change, setting ENABLE_KVCACHED=1 (or KVCACHED_AUTOPATCH=1) inside the user's script — at any point before import vllm — will work. No source-level from kvcached import autopatch required.

Cost: registering two when_imported hooks at every Python startup on systems where kvcached is installed. Cheap (no vllm/sglang import is triggered) but non-zero. An optional KVCACHED_DISABLE_AUTOPATCH=1 escape hatch in the .pth would preserve a fully-off mode.

Workarounds (current behavior)

  1. Export ENABLE_KVCACHED=1 in the shell before launching Python, or
  2. Add from kvcached import autopatch before any import vllm / import sglang.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions