-
Notifications
You must be signed in to change notification settings - Fork 831
[ET-VK][qconv] Add layout-flexible impl of quantized depthwise conv2d #17108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds a new layout-agnostic quantized depthwise convolution operator
`etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization
constants to support arbitrary memory layouts (contiguous, channels-last,
4W4C block-packed, etc.).
Key changes:
1. New shader `q8ta_conv2d_dw.glsl`:
- Uses BufferMetadata for input/output tensor addressing
- Layout-aware via `inp_layout`/`outp_layout` specialization constants
- Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile)
- Includes optimized paths for simple layouts (outer_block_size == 1)
2. New indexing utilities in `indexing.glslh`:
- `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords
- `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index
3. Code refactoring:
- Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h
- Create Q8taConv2dDW.cpp with new operator implementation
- Add Q8taConv2d.h with public API declarations
- Move `prepack_quantized_conv2d_dw_weight()` to new implementation file
4. New workgroup size helpers:
- `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size
- `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims
5. Test updates:
- Rename test to `test_q8_conv2d_dw.cpp`
- Add `TestQ8taConv2d.cpp` with shared test utilities
Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/)
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17108
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Pending, 3 Unrelated FailuresAs of commit ac864d9 with merge base 1cffd23 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Adds a new layout-agnostic quantized depthwise convolution operator
`etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization
constants to support arbitrary memory layouts (contiguous, channels-last,
4W4C block-packed, etc.).
Key changes:
1. New shader `q8ta_conv2d_dw.glsl`:
- Uses BufferMetadata for input/output tensor addressing
- Layout-aware via `inp_layout`/`outp_layout` specialization constants
- Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile)
- Includes optimized paths for simple layouts (outer_block_size == 1)
2. New indexing utilities in `indexing.glslh`:
- `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords
- `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index
3. Code refactoring:
- Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h
- Create Q8taConv2dDW.cpp with new operator implementation
- Add Q8taConv2d.h with public API declarations
- Move `prepack_quantized_conv2d_dw_weight()` to new implementation file
4. New workgroup size helpers:
- `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size
- `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims
5. Test updates:
- Rename test to `test_q8_conv2d_dw.cpp`
- Add `TestQ8taConv2d.cpp` with shared test utilities
Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/)
ghstack-source-id: 337539965
Pull Request resolved: #17108
This PR needs a
|
…wise conv2d"
Adds a new layout-agnostic quantized depthwise convolution operator
`etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization
constants to support arbitrary memory layouts (contiguous, channels-last,
4W4C block-packed, etc.).
Key changes:
1. New shader `q8ta_conv2d_dw.glsl`:
- Uses BufferMetadata for input/output tensor addressing
- Layout-aware via `inp_layout`/`outp_layout` specialization constants
- Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile)
- Includes optimized paths for simple layouts (outer_block_size == 1)
2. New indexing utilities in `indexing.glslh`:
- `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords
- `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index
3. Code refactoring:
- Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h
- Create Q8taConv2dDW.cpp with new operator implementation
- Add Q8taConv2d.h with public API declarations
- Move `prepack_quantized_conv2d_dw_weight()` to new implementation file
4. New workgroup size helpers:
- `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size
- `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims
5. Test updates:
- Rename test to `test_q8_conv2d_dw.cpp`
- Add `TestQ8taConv2d.cpp` with shared test utilities
Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/)
[ghstack-poisoned]
…wise conv2d"
Adds a new layout-agnostic quantized depthwise convolution operator
`etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization
constants to support arbitrary memory layouts (contiguous, channels-last,
4W4C block-packed, etc.).
Key changes:
1. New shader `q8ta_conv2d_dw.glsl`:
- Uses BufferMetadata for input/output tensor addressing
- Layout-aware via `inp_layout`/`outp_layout` specialization constants
- Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile)
- Includes optimized paths for simple layouts (outer_block_size == 1)
2. New indexing utilities in `indexing.glslh`:
- `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords
- `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index
3. Code refactoring:
- Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h
- Create Q8taConv2dDW.cpp with new operator implementation
- Add Q8taConv2d.h with public API declarations
- Move `prepack_quantized_conv2d_dw_weight()` to new implementation file
4. New workgroup size helpers:
- `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size
- `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims
5. Test updates:
- Rename test to `test_q8_conv2d_dw.cpp`
- Add `TestQ8taConv2d.cpp` with shared test utilities
Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/)
[ghstack-poisoned]
…wise conv2d"
Adds a new layout-agnostic quantized depthwise convolution operator
`etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization
constants to support arbitrary memory layouts (contiguous, channels-last,
4W4C block-packed, etc.).
Key changes:
1. New shader `q8ta_conv2d_dw.glsl`:
- Uses BufferMetadata for input/output tensor addressing
- Layout-aware via `inp_layout`/`outp_layout` specialization constants
- Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile)
- Includes optimized paths for simple layouts (outer_block_size == 1)
2. New indexing utilities in `indexing.glslh`:
- `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords
- `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index
3. Code refactoring:
- Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h
- Create Q8taConv2dDW.cpp with new operator implementation
- Add Q8taConv2d.h with public API declarations
- Move `prepack_quantized_conv2d_dw_weight()` to new implementation file
4. New workgroup size helpers:
- `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size
- `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims
5. Test updates:
- Rename test to `test_q8_conv2d_dw.cpp`
- Add `TestQ8taConv2d.cpp` with shared test utilities
Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/)
[ghstack-poisoned]
…wise conv2d"
Adds a new layout-agnostic quantized depthwise convolution operator
`etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization
constants to support arbitrary memory layouts (contiguous, channels-last,
4W4C block-packed, etc.).
Key changes:
1. New shader `q8ta_conv2d_dw.glsl`:
- Uses BufferMetadata for input/output tensor addressing
- Layout-aware via `inp_layout`/`outp_layout` specialization constants
- Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile)
- Includes optimized paths for simple layouts (outer_block_size == 1)
2. New indexing utilities in `indexing.glslh`:
- `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords
- `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index
3. Code refactoring:
- Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h
- Create Q8taConv2dDW.cpp with new operator implementation
- Add Q8taConv2d.h with public API declarations
- Move `prepack_quantized_conv2d_dw_weight()` to new implementation file
4. New workgroup size helpers:
- `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size
- `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims
5. Test updates:
- Rename test to `test_q8_conv2d_dw.cpp`
- Add `TestQ8taConv2d.cpp` with shared test utilities
Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/)
[ghstack-poisoned]
53616fe
into
gh/SS-JIA/401/base
Pull Request resolved: #17108 Adds a new layout-agnostic quantized depthwise convolution operator `etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization constants to support arbitrary memory layouts (contiguous, channels-last, 4W4C block-packed, etc.). Key changes: 1. New shader `q8ta_conv2d_dw.glsl`: - Uses BufferMetadata for input/output tensor addressing - Layout-aware via `inp_layout`/`outp_layout` specialization constants - Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile) - Includes optimized paths for simple layouts (outer_block_size == 1) 2. New indexing utilities in `indexing.glslh`: - `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords - `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index 3. Code refactoring: - Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h - Create Q8taConv2dDW.cpp with new operator implementation - Add Q8taConv2d.h with public API declarations - Move `prepack_quantized_conv2d_dw_weight()` to new implementation file 4. New workgroup size helpers: - `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size - `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims 5. Test updates: - Rename test to `test_q8_conv2d_dw.cpp` - Add `TestQ8taConv2d.cpp` with shared test utilities ghstack-source-id: 338638549 @exported-using-ghexport Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/)
Pull Request resolved: #17108 Adds a new layout-agnostic quantized depthwise convolution operator `etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization constants to support arbitrary memory layouts (contiguous, channels-last, 4W4C block-packed, etc.). Key changes: 1. New shader `q8ta_conv2d_dw.glsl`: - Uses BufferMetadata for input/output tensor addressing - Layout-aware via `inp_layout`/`outp_layout` specialization constants - Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile) - Includes optimized paths for simple layouts (outer_block_size == 1) 2. New indexing utilities in `indexing.glslh`: - `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords - `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index 3. Code refactoring: - Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h - Create Q8taConv2dDW.cpp with new operator implementation - Add Q8taConv2d.h with public API declarations - Move `prepack_quantized_conv2d_dw_weight()` to new implementation file 4. New workgroup size helpers: - `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size - `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims 5. Test updates: - Rename test to `test_q8_conv2d_dw.cpp` - Add `TestQ8taConv2d.cpp` with shared test utilities ghstack-source-id: 338638549 @exported-using-ghexport Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/)
Stack from ghstack (oldest at bottom):
Adds a new layout-agnostic quantized depthwise convolution operator
etvk.q8ta_conv2d_dwthat uses BufferMetadata and layout specializationconstants to support arbitrary memory layouts (contiguous, channels-last,
4W4C block-packed, etc.).
Key changes:
New shader
q8ta_conv2d_dw.glsl:inp_layout/outp_layoutspecialization constantsNew indexing utilities in
indexing.glslh:texel_idx_to_tensor4d_idx(): converts linear texel index to 4D tensor coordstensor4d_idx_to_texel_idx(): converts 4D tensor index to texel indexCode refactoring:
Conv2DParamsstruct andcreate_conv2d_params()to ConvolutionUtils.hprepack_quantized_conv2d_dw_weight()to new implementation fileNew workgroup size helpers:
pick_q8ta_conv2d_dw_global_wg_size(): computes {W4, H, C4} dispatch sizepick_q8ta_conv2d_dw_local_wg_size(): adaptive local size based on tensor dimsTest updates:
test_q8_conv2d_dw.cppTestQ8taConv2d.cppwith shared test utilitiesDifferential Revision: D92061368