[ET-VK][ez] Implement helper functions to get fastest moving dim #17107

SS-JIA · 2026-02-02T17:14:00Z

Stack from ghstack (oldest at bottom):

Add C++ and GLSL helpers to query the fastest moving dimension (the
dimension with stride 1 in buffer layout). This is useful for optimizing
memory access patterns in shaders, as iterating along the fastest moving
dimension maximizes cache locality.

The C++ fastest_whcn_dim() method accounts for block-transposed layouts by
returning outer_packed_dim instead of packed_dim when applicable. A
corresponding GLSL macro extracts this info from the hashed layout.

Differential Revision: D92061369

Add C++ and GLSL helpers to query the fastest moving dimension (the dimension with stride 1 in buffer layout). This is useful for optimizing memory access patterns in shaders, as iterating along the fastest moving dimension maximizes cache locality. The C++ `fastest_whcn_dim()` method accounts for block-transposed layouts by returning `outer_packed_dim` instead of `packed_dim` when applicable. A corresponding GLSL macro extracts this info from the hashed layout. Differential Revision: [D92061369](https://our.internmc.facebook.com/intern/diff/D92061369/) [ghstack-poisoned]

pytorch-bot · 2026-02-02T17:14:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17107

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Pending, 3 Unrelated Failures

As of commit 2b6baec with merge base 1cffd23 ():

NEW FAILURES - The following jobs have failed:

pull / android / run-emulator (gh)
The process '/usr/bin/sh' failed with exit code 1
pull / unittest / linux / linux-job (gh)
RuntimeError: Command docker exec -t d2438cb1cc553f7fe992671b0409422851af8958d8b3fd36ed1733bfb98a7a6a /exec failed with exit code 1
pull / unittest-editable / linux / linux-job (gh)
RuntimeError: Command docker exec -t 4183d7acaa2fc28db6d56bb6b2d4eaf6e4ed7d4d36a817dbe90d9fb39982ec56 /exec failed with exit code 1
Test CUDA Builds / export-model-cuda-artifact (openai, whisper-large-v3-turbo, quantized-int4-tile-packed) / linux-job (gh)
RuntimeError: Command docker exec -t 5a4655e3b6f1cfeafe0da8aec8421573c8c53a3b1deefd854fd8f2abcced92b6 /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-samsung-quantmodels-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-02T17:15:01Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…ng dim" Add C++ and GLSL helpers to query the fastest moving dimension (the dimension with stride 1 in buffer layout). This is useful for optimizing memory access patterns in shaders, as iterating along the fastest moving dimension maximizes cache locality. The C++ `fastest_whcn_dim()` method accounts for block-transposed layouts by returning `outer_packed_dim` instead of `packed_dim` when applicable. A corresponding GLSL macro extracts this info from the hashed layout. Differential Revision: [D92061369](https://our.internmc.facebook.com/intern/diff/D92061369/) [ghstack-poisoned]

Pull Request resolved: #17107 Add C++ and GLSL helpers to query the fastest moving dimension (the dimension with stride 1 in buffer layout). This is useful for optimizing memory access patterns in shaders, as iterating along the fastest moving dimension maximizes cache locality. The C++ `fastest_whcn_dim()` method accounts for block-transposed layouts by returning `outer_packed_dim` instead of `packed_dim` when applicable. A corresponding GLSL macro extracts this info from the hashed layout. ghstack-source-id: 338638547 @exported-using-ghexport Differential Revision: [D92061369](https://our.internmc.facebook.com/intern/diff/D92061369/)

This was referenced Feb 2, 2026

[ET-VK][testing] Add per-shader timing breakdown to benchmark output #17105

Merged

[ET-VK][quantization] Implement layout-flexible quantize/dequantize operators #17106

Merged

[ET-VK][qconv] Add layout-flexible impl of quantized depthwise conv2d #17108

Merged

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 2, 2026

This was referenced Feb 3, 2026

[ET-VK] Add alignment fields to PackedDimInfo for padded size calculation #17170

Merged

[ET-VK][quantization] Add layout-flexible clone for int8x4 tensors #17171

Merged

meta-codesync bot added fb-exported meta-exported labels Feb 3, 2026

This was referenced Feb 4, 2026

[ET-VK][qconv] Add layout-agnostic general shader for quantized conv #17219

Merged

[ET-VK][testing] Create dedicated test binary for pointwise convolutions #17220

Merged

[ET-VK][qconv] Add flexible layout impl for quantized pointwise conv #17221

Merged

SS-JIA mentioned this pull request Feb 5, 2026

[ET-VK][qconv] Add flexible layout impl for im2col #17249

Merged

manuelcandales approved these changes Feb 5, 2026

View reviewed changes

meta-codesync bot merged commit 89dd9ae into gh/SS-JIA/400/base Feb 5, 2026
163 of 171 checks passed

meta-codesync bot deleted the gh/SS-JIA/400/head branch February 5, 2026 23:28

meta-codesync bot temporarily deployed to cherry-pick-bot February 5, 2026 23:28 Inactive

pytorchbot mentioned this pull request Feb 5, 2026

[ET-VK][ez] Implement helper functions to get fastest moving dim #17262

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK][ez] Implement helper functions to get fastest moving dim #17107

[ET-VK][ez] Implement helper functions to get fastest moving dim #17107

Uh oh!

SS-JIA commented Feb 2, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ET-VK][ez] Implement helper functions to get fastest moving dim #17107

[ET-VK][ez] Implement helper functions to get fastest moving dim #17107

Uh oh!

Conversation

SS-JIA commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17107

❌ 4 New Failures, 1 Pending, 3 Unrelated Failures

Uh oh!

github-actions bot commented Feb 2, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SS-JIA commented Feb 2, 2026 •

edited

Loading

pytorch-bot bot commented Feb 2, 2026 •

edited

Loading

This PR needs a `release notes:` label