-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation and sample on KV cache eviction #1960
Conversation
site/docs/concepts/optimization-techniques/kvcache-eviction-algorithm.md
Show resolved
Hide resolved
site/docs/concepts/optimization-techniques/kvcache-eviction-algorithm.md
Outdated
Show resolved
Hide resolved
site/docs/concepts/optimization-techniques/kvcache-eviction-algorithm.md
Outdated
Show resolved
Hide resolved
## Conceptual Model | ||
The KV cache for each sequence is divided into three logical areas: | ||
|
||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Images should be placed at site/static/img
directory and referenced here from this directory:
 | |
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,247 @@ | |||
|
|||
import gc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an example that should be moved to the samples folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
* Start Area: Initial tokens that are never evicted | ||
* Evictable Area: Tokens that can be evicted based on importance scores | ||
* Recent Area: Most recent tokens that are preserved (never evicted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Recent Area: Most recent tokens that are preserved (never evicted) | |
* Recent Area: Most recent tokens that are preserved but migrate to the Evictable Area with iterations of text generation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Well-written! I would add a paragraph at the end like "Areas/Subjects for improvements". |
Done |
No description provided.