gemma4_31b (MLX): don't allocate a second KV-cache copy for single-session runs by metascroy · Pull Request #20560 · pytorch/executorch

metascroy · 2026-06-27T00:48:30Z

Since #20473, the MLX engine always enables the multi-session mutable-state path, so create_session() allocates a per-session KV-cache copy on top of the program's default buffers — which rebind never uses again once a session exists — leaving single-session runs holding two full KV caches. This gates the MLX mutable_state creation on config.max_sessions > 1, so the single-session CLI runner (max_sessions = 1) executes against the default buffers (one KV cache, as before #20473) while the multi-session worker is unchanged. Result: lower peak memory and recovered prefill tok/s for single-session runs (significant at long context); decode is unaffected. Test: make gemma4_31b-mlx then run the runner — peak memory drops and prefill recovers, multi-session isolation still works.

Follow-up: This only fixes the single-session case. Under multi-session the program's default mutable buffers are still allocated but go unused once any session is created, so the worker pays one extra dead KV-cache copy (N+1 for N sessions). A follow-up could release or donate the default buffers to the first session under multi-session, with a check that the init chain only zero-initializes KV state (per-session buffers are freshly zeroed and don't replay it).

pytorch-bot · 2026-06-27T00:48:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20560

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Unclassified Failures

As of commit 74e77f5 with merge base 51729bb ():

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

Build Aarch64 Linux Wheels / pytorch/executorch / build-wheel-py3_10-cpu-aarch64 (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
/__w/executorch/executorch/pytorch/executorch/backends/apple/coreml/runtime/inmemoryfs/inmemory_filesystem.cpp:722:48: error: ‘inmemoryfs::InMemoryFileSystem::InMemoryNode::Kind’ has not been declared
Build Aarch64 Linux Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu-aarch64 (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_aarch64

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-27T00:49:18Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

up

74e77f5

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 27, 2026

metascroy requested review from Gasoonjia and digantdesai June 27, 2026 00:48

metascroy temporarily deployed to cadence June 27, 2026 00:49 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gemma4_31b (MLX): don't allocate a second KV-cache copy for single-session runs#20560

gemma4_31b (MLX): don't allocate a second KV-cache copy for single-session runs#20560
metascroy wants to merge 1 commit into
mainfrom
fix-mem-perf-regression

metascroy commented Jun 27, 2026

Uh oh!

pytorch-bot Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

metascroy commented Jun 27, 2026

Uh oh!

pytorch-bot Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20560

❌ 2 Unclassified Failures

Uh oh!

github-actions Bot commented Jun 27, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot Bot commented Jun 27, 2026 •

edited

Loading

This PR needs a `release notes:` label