Add eager release for readvalue and kvcache #32916

ZackyLake · 2025-11-18T18:02:53Z

Details:

A proposal to eagerly release the output memory for kvcache related primitive, avoiding keeping unnecessary memory/tensor during idle state.

Currently readvalue primitive is keeping a reference to "past state" in its output, that leads to memory spike due to duplicated kvcache. It's not totally resolvable since outputs variable is used during kvcache node's execution, but we could release it after whole graph execution.

And kvcache primitive is also keeping a reference to "new state" in its output, which should also be hold by variable state. This is preventing us releasing the kvcache explicitly in OVEP.

The memory reclaim/release is only happening during primitive execution, so idle state suffers from high memory usage.

With this change, memory usage drops right before returning from execution, and releasing(reset) variablestate correctly release the memory.

There's some concerns on synchronizing and data race, so please let us know how it can work properly.

Tickets:

CVS-176852

Add eager release for readvalue and kvcache

e160ea9

github-actions bot added the category: GPU OpenVINO GPU plugin label Nov 18, 2025

sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Nov 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add eager release for readvalue and kvcache #32916

Add eager release for readvalue and kvcache #32916

ZackyLake commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add eager release for readvalue and kvcache #32916

Are you sure you want to change the base?

Add eager release for readvalue and kvcache #32916

Conversation

ZackyLake commented Nov 18, 2025

Details:

Tickets:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants