Skip to content

_get_max_cached_blocks truncates small positive KVCACHED_MAX_CACHED_TOKENS to the disabled #343

@cui36

Description

@cui36

kvcached/integration/vllm/patches.py:256 does MAX_CACHED_TOKENS // block_size. With e.g. KVCACHED_MAX_CACHED_TOKENS=8 and block_size=16, this returns 0. Result: prefix caching is silently turned off.

Fix: maybe we should add a low boundary like:
python return max(1, MAX_CACHED_TOKENS // block_size) ​

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions