[ET-VK][qconv] Add layout-flexible impl of quantized depthwise conv2d #17108

SS-JIA · 2026-02-02T17:14:05Z

Stack from ghstack (oldest at bottom):

Adds a new layout-agnostic quantized depthwise convolution operator
etvk.q8ta_conv2d_dw that uses BufferMetadata and layout specialization
constants to support arbitrary memory layouts (contiguous, channels-last,
4W4C block-packed, etc.).

Key changes:

New shader q8ta_conv2d_dw.glsl:
- Uses BufferMetadata for input/output tensor addressing
- Layout-aware via inp_layout/outp_layout specialization constants
- Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile)
- Includes optimized paths for simple layouts (outer_block_size == 1)
New indexing utilities in indexing.glslh:
- texel_idx_to_tensor4d_idx(): converts linear texel index to 4D tensor coords
- tensor4d_idx_to_texel_idx(): converts 4D tensor index to texel index
Code refactoring:
- Extract Conv2DParams struct and create_conv2d_params() to ConvolutionUtils.h
- Create Q8taConv2dDW.cpp with new operator implementation
- Add Q8taConv2d.h with public API declarations
- Move prepack_quantized_conv2d_dw_weight() to new implementation file
New workgroup size helpers:
- pick_q8ta_conv2d_dw_global_wg_size(): computes {W4, H, C4} dispatch size
- pick_q8ta_conv2d_dw_local_wg_size(): adaptive local size based on tensor dims
Test updates:
- Rename test to test_q8_conv2d_dw.cpp
- Add TestQ8taConv2d.cpp with shared test utilities

Differential Revision: D92061368

Adds a new layout-agnostic quantized depthwise convolution operator `etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization constants to support arbitrary memory layouts (contiguous, channels-last, 4W4C block-packed, etc.). Key changes: 1. New shader `q8ta_conv2d_dw.glsl`: - Uses BufferMetadata for input/output tensor addressing - Layout-aware via `inp_layout`/`outp_layout` specialization constants - Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile) - Includes optimized paths for simple layouts (outer_block_size == 1) 2. New indexing utilities in `indexing.glslh`: - `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords - `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index 3. Code refactoring: - Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h - Create Q8taConv2dDW.cpp with new operator implementation - Add Q8taConv2d.h with public API declarations - Move `prepack_quantized_conv2d_dw_weight()` to new implementation file 4. New workgroup size helpers: - `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size - `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims 5. Test updates: - Rename test to `test_q8_conv2d_dw.cpp` - Add `TestQ8taConv2d.cpp` with shared test utilities Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/) [ghstack-poisoned]

pytorch-bot · 2026-02-02T17:14:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17108

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Pending, 3 Unrelated Failures

As of commit ac864d9 with merge base 1cffd23 ():

NEW FAILURES - The following jobs have failed:

pull / unittest / linux / linux-job (gh)
RuntimeError: Command docker exec -t 1e8e61258106288c408d09198073f2658350039fbb42fdae4c5b0690f969af6e /exec failed with exit code 1
pull / unittest-editable / linux / linux-job (gh)
RuntimeError: Command docker exec -t ac4c54d9d2baac290a3ad7a4a1bd3639242e7142ddad5b6755a629c51fde7192 /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-samsung-quantmodels-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Adds a new layout-agnostic quantized depthwise convolution operator `etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization constants to support arbitrary memory layouts (contiguous, channels-last, 4W4C block-packed, etc.). Key changes: 1. New shader `q8ta_conv2d_dw.glsl`: - Uses BufferMetadata for input/output tensor addressing - Layout-aware via `inp_layout`/`outp_layout` specialization constants - Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile) - Includes optimized paths for simple layouts (outer_block_size == 1) 2. New indexing utilities in `indexing.glslh`: - `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords - `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index 3. Code refactoring: - Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h - Create Q8taConv2dDW.cpp with new operator implementation - Add Q8taConv2d.h with public API declarations - Move `prepack_quantized_conv2d_dw_weight()` to new implementation file 4. New workgroup size helpers: - `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size - `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims 5. Test updates: - Rename test to `test_q8_conv2d_dw.cpp` - Add `TestQ8taConv2d.cpp` with shared test utilities Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/) ghstack-source-id: 337539965 Pull Request resolved: #17108

github-actions · 2026-02-02T17:15:36Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…wise conv2d" Adds a new layout-agnostic quantized depthwise convolution operator `etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization constants to support arbitrary memory layouts (contiguous, channels-last, 4W4C block-packed, etc.). Key changes: 1. New shader `q8ta_conv2d_dw.glsl`: - Uses BufferMetadata for input/output tensor addressing - Layout-aware via `inp_layout`/`outp_layout` specialization constants - Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile) - Includes optimized paths for simple layouts (outer_block_size == 1) 2. New indexing utilities in `indexing.glslh`: - `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords - `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index 3. Code refactoring: - Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h - Create Q8taConv2dDW.cpp with new operator implementation - Add Q8taConv2d.h with public API declarations - Move `prepack_quantized_conv2d_dw_weight()` to new implementation file 4. New workgroup size helpers: - `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size - `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims 5. Test updates: - Rename test to `test_q8_conv2d_dw.cpp` - Add `TestQ8taConv2d.cpp` with shared test utilities Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/) [ghstack-poisoned]

Pull Request resolved: #17108 Adds a new layout-agnostic quantized depthwise convolution operator `etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization constants to support arbitrary memory layouts (contiguous, channels-last, 4W4C block-packed, etc.). Key changes: 1. New shader `q8ta_conv2d_dw.glsl`: - Uses BufferMetadata for input/output tensor addressing - Layout-aware via `inp_layout`/`outp_layout` specialization constants - Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile) - Includes optimized paths for simple layouts (outer_block_size == 1) 2. New indexing utilities in `indexing.glslh`: - `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords - `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index 3. Code refactoring: - Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h - Create Q8taConv2dDW.cpp with new operator implementation - Add Q8taConv2d.h with public API declarations - Move `prepack_quantized_conv2d_dw_weight()` to new implementation file 4. New workgroup size helpers: - `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size - `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims 5. Test updates: - Rename test to `test_q8_conv2d_dw.cpp` - Add `TestQ8taConv2d.cpp` with shared test utilities ghstack-source-id: 338638549 @exported-using-ghexport Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/)

SS-JIA requested review from kirklandsign and larryliu0820 as code owners February 2, 2026 17:14

This was referenced Feb 2, 2026

[ET-VK][testing] Add per-shader timing breakdown to benchmark output #17105

Merged

[ET-VK][quantization] Implement layout-flexible quantize/dequantize operators #17106

Merged

SS-JIA mentioned this pull request Feb 2, 2026

[ET-VK][ez] Implement helper functions to get fastest moving dim #17107

Merged

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 2, 2026

meta-codesync bot added fb-exported meta-exported labels Feb 2, 2026

This was referenced Feb 3, 2026

[ET-VK] Add alignment fields to PackedDimInfo for padded size calculation #17170

Merged

[ET-VK][quantization] Add layout-flexible clone for int8x4 tensors #17171

Merged

This was referenced Feb 4, 2026

[ET-VK][qconv] Add layout-agnostic general shader for quantized conv #17219

Merged

[ET-VK][testing] Create dedicated test binary for pointwise convolutions #17220

Merged

[ET-VK][qconv] Add flexible layout impl for quantized pointwise conv #17221

Merged

SS-JIA mentioned this pull request Feb 5, 2026

[ET-VK][qconv] Add flexible layout impl for im2col #17249

Merged

manuelcandales approved these changes Feb 5, 2026

View reviewed changes

meta-codesync bot merged commit 53616fe into gh/SS-JIA/401/base Feb 5, 2026
175 of 184 checks passed

meta-codesync bot deleted the gh/SS-JIA/401/head branch February 5, 2026 23:28

meta-codesync bot temporarily deployed to cherry-pick-bot February 5, 2026 23:28 Inactive

pytorchbot mentioned this pull request Feb 5, 2026

[ET-VK][qconv] Add layout-flexible impl of quantized depthwise conv2d #17263

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK][qconv] Add layout-flexible impl of quantized depthwise conv2d #17108

[ET-VK][qconv] Add layout-flexible impl of quantized depthwise conv2d #17108

Uh oh!

SS-JIA commented Feb 2, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ET-VK][qconv] Add layout-flexible impl of quantized depthwise conv2d #17108

[ET-VK][qconv] Add layout-flexible impl of quantized depthwise conv2d #17108

Uh oh!

Conversation

SS-JIA commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17108

❌ 2 New Failures, 1 Pending, 3 Unrelated Failures

Uh oh!

github-actions bot commented Feb 2, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SS-JIA commented Feb 2, 2026 •

edited

Loading

pytorch-bot bot commented Feb 2, 2026 •

edited

Loading

This PR needs a `release notes:` label