Add eager release for readvalue and kvcache #32916
Draft
+48
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Details:
A proposal to eagerly release the output memory for kvcache related primitive, avoiding keeping unnecessary memory/tensor during idle state.
Currently readvalue primitive is keeping a reference to "past state" in its output, that leads to memory spike due to duplicated kvcache. It's not totally resolvable since outputs variable is used during kvcache node's execution, but we could release it after whole graph execution.
And kvcache primitive is also keeping a reference to "new state" in its output, which should also be hold by variable state. This is preventing us releasing the kvcache explicitly in OVEP.
The memory reclaim/release is only happening during primitive execution, so idle state suffers from high memory usage.
With this change, memory usage drops right before returning from execution, and releasing(reset) variablestate correctly release the memory.
There's some concerns on synchronizing and data race, so please let us know how it can work properly.
Tickets: