kvcached/integration/vllm/patches.py:256 does MAX_CACHED_TOKENS // block_size. With e.g. KVCACHED_MAX_CACHED_TOKENS=8 and block_size=16, this returns 0. Result: prefix caching is silently turned off.
Fix: maybe we should add a low boundary like:
python return max(1, MAX_CACHED_TOKENS // block_size)
kvcached/integration/vllm/patches.py:256doesMAX_CACHED_TOKENS // block_size. With e.g.KVCACHED_MAX_CACHED_TOKENS=8andblock_size=16, this returns0. Result: prefix caching is silently turned off.Fix: maybe we should add a low boundary like:
python return max(1, MAX_CACHED_TOKENS // block_size)