rally to refactor the disk prefix caching #1420

magikRUKKOLA · 2025-07-05T20:45:01Z

magikRUKKOLA
Jul 5, 2025

First of all, I would like to express my gratitude towards the authors of the KTransformers framework.
But...
Apparently we are having the issues with prefix caching...
The problem is that it's really hard to navigate the snapshots of the KV cache.
The new data is getting appended (! WTF?!!!) to the binary file.
Not the text file, not the JSON file, not XML file, but the binary file.
I can understand the benefits of the binary format.
Of course, its efficiency.
But...
If you are advocating for the efficiency...
It seems quite logical to provide the additional tools to rule this binary format.
Unfortunately, right now, the management of the snapshots of the KV cache... which is getting produced via the prefill cache module of the balance_serve backend of the KTransformers... is NONEXISTENT! The async query manager was able or wasn't able to find some prefix ... this is all what the user is seeing. This is not what we want. We want the full control!
So...
I think I'm expressing right now the voice of the whole community.
And it basically means that we need the clear documentation regarding the format of the KV cache...
And an additional API to save (or restore) the state of the LLM.
Thank you for your attention.
train noise

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rally to refactor the disk prefix caching #1420

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

rally to refactor the disk prefix caching #1420

Uh oh!

Uh oh!

magikRUKKOLA Jul 5, 2025

Replies: 0 comments

magikRUKKOLA
Jul 5, 2025