Per-occurrence KV cache for transformer_block_repeat_config by pssrawat · Pull Request #19324 · pytorch/executorch

pssrawat · 2026-05-06T04:59:49Z

Summary:
Currently, when a TransformerBlock appears multiple times in MultimodalTransformer.layer_schedule (via args.transformer_block_repeat_config), each visit to that layer reads and writes the same self.attention.kv_cache buffer. The repeated layer therefore shares its K/V history across both visits — this is "weight-shared loop with shared KV", which is not numerically equivalent to a physically unrolled N-layer model where each duplicated layer slot owns its own K/V cache.

This diff adds an opt-in path so each occurrence in the schedule can use its own KV cache buffer while still sharing the layer's weight Parameters, giving the same numerical inference behavior as lowering an unrolled checkpoint.

The model size (with transformer_block_repeat_config) remains the same as the original model.

Differential Revision: D103962616

Summary: Add configurable block repetition to MultimodalTransformer, enabling weight-shared depth scaling. A contiguous range of transformer layers can now be executed multiple times with shared weights. Add block_repeat_config field to ModelArgs (list of {start, end, count} dicts) Example params.json: "block_repeat_config": [{"start": 5, "end": 10, "count": 2}] Reviewed By: AdithyaSagar007 Differential Revision: D102393826

Summary: Currently, when a TransformerBlock appears multiple times in MultimodalTransformer.layer_schedule (via ``args.transformer_block_repeat_config``), each visit to that layer reads and writes the same ``self.attention.kv_cache`` buffer. The repeated layer therefore shares its K/V history across both visits — this is "weight-shared loop with shared KV", which is not numerically equivalent to a physically unrolled N-layer model where each duplicated layer slot owns its own K/V cache. This diff adds an opt-in path so each occurrence in the schedule can use its own KV cache buffer while still sharing the layer's weight Parameters, giving the same numerical inference behavior as lowering an unrolled checkpoint. The model size (with transformer_block_repeat_config) remains the same as the original model. Differential Revision: D103962616

pytorch-bot · 2026-05-06T04:59:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19324

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Cancelled Jobs, 2 Unrelated Failures

As of commit 167842a with merge base 1debeb6 ():

CANCELLED JOBS - The following jobs were cancelled. Please retry:

pull / unittest / macos / macos-job (gh)
##[error]The operation was canceled.
pull / unittest-editable / macos / macos-job (gh)
##[error]The operation was canceled.

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Test CoreML Backend / test-coreml / test-backend-macos (coreml, models) / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
Test CoreML Backend / test-coreml / test-backend-macos (coreml, operators) / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-05-06T04:59:58Z

@pssrawat has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103962616.

kimishpatel

Review automatically exported from Phabricator review in Meta.

pssrawat added 2 commits May 5, 2026 21:59

pssrawat requested a review from lucylq as a code owner May 6, 2026 04:59

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 6, 2026

meta-codesync Bot added fb-exported meta-exported labels May 6, 2026

pssrawat requested a review from kimishpatel May 6, 2026 05:16

pssrawat added the release notes: none Do not include this in the release notes label May 6, 2026

kimishpatel requested changes May 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-occurrence KV cache for transformer_block_repeat_config#19324

Per-occurrence KV cache for transformer_block_repeat_config#19324
pssrawat wants to merge 2 commits intopytorch:mainfrom
pssrawat:export-D103962616

pssrawat commented May 6, 2026

Uh oh!

pytorch-bot Bot commented May 6, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented May 6, 2026

Uh oh!

kimishpatel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pssrawat commented May 6, 2026

Uh oh!

pytorch-bot Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19324

❌ 2 Cancelled Jobs, 2 Unrelated Failures

Uh oh!

meta-codesync Bot commented May 6, 2026

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented May 6, 2026 •

edited

Loading