Enable QuantFusionPass in compiler pipeline (#19728)#19728
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19728
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 2 Unrelated FailuresAs of commit 181c7bd with merge base 7d8063f ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@ethansfng has exported this pull request. If you are a Meta employee, you can view the originating Diff in D105728219. |
This PR needs a
|
Summary: Enable new QuantFusionPass that calls pattern.fuse() instead of monolithic QuantFusion Differential Revision: D105728219
42b8ffe to
8f2e4ea
Compare
Summary: Enable new QuantFusionPass that calls pattern.fuse() instead of monolithic QuantFusion Differential Revision: D105728219
8f2e4ea to
250f45e
Compare
Summary: Enable new QuantFusionPass that calls pattern.fuse() instead of monolithic QuantFusion Differential Revision: D105728219
250f45e to
9010a85
Compare
9010a85 to
6e7ff7c
Compare
…19743) Summary: torchao's `convert_pt2e` adds `out_dtype` kwargs to dequant nodes for bf16 models. `cadence::dequantize_per_tensor` doesn't support this kwarg (it hardcodes float32 output), so `ReplacePT2DequantWithCadenceDequantPass` crashes when it forwards kwargs blindly to the cadence op. Strip `out_dtype` from kwargs before creating the cadence dequant node, and insert an `aten.to.dtype` cast after it to preserve the original output dtype semantics. Differential Revision: D105630451
Summary: Add infrastructure for per-pattern `fuse()` methods on Cadence `QuantizationPattern`: - Add `anchor_ops()` (default: `tuple(partition_types())`) and `fuse()` (default: `None`) to `QuantizationPattern` base class - Add shared fusion helpers: `_get_dequant`, `_find_quant_user`, `_insert_fused_op`, `_maybe_route_depthwise_conv1d`, `_fuse_conv`, `_fuse_linear`, `_fuse_matmul` - Add `QuantFusionPass` to `compiler_funcs.py` — shared executor that iterates patterns, matches `anchor_ops()`, calls `fuse()` with debug logging and dead code elimination Differential Revision: D105728137
Summary: Add `fuse()` implementations to the first batch of Cadence `QuantizationPattern` subclasses — the standard fully-quantized patterns that use the shared `_fuse_conv`, `_fuse_linear`, and `_fuse_matmul` helpers: - `AddmmPattern` — transpose weight + linear fusion - `AddPattern` — two-input quantized add - `AddReluBasePattern` — add+relu fusion with `anchor_ops()` override - `BmmPattern`, `MatmulPattern` — matmul fusion via `_fuse_matmul` - `CatPattern` — cat passthrough on quantized inputs - `Conv1dPattern`, `Conv2dPattern` — conv fusion via `_fuse_conv` with depthwise routing - `LayerNormPattern` — layer norm with default weight/bias creation - `LinearPattern` — linear fusion via `_fuse_linear` Differential Revision: D105728156
Summary: Add `fuse()` implementations to the remaining Cadence `QuantizationPattern` subclasses: - `MaxPool2dPattern`, `MaxPool2dWithoutIndicesPattern` — order-preserving pool on quantized values - `ReluBasePattern` (inherited by `ReluPattern0`/`1`) — relu with requantization - `ConvReluBasePattern` (inherited by `Conv1d`/`2dReluPattern0`/`1`) — conv+relu fusion with `anchor_ops()` override to match only the conv op - `SoftmaxPattern` — softmax with dummy mask/pos tensors and fake_mode metadata - `MixedW8A32LinearPattern` — weight-only quantized linear (no input/output quant) - `MixedW8A32ConvPattern` — weight-only quantized conv1d with NCL→NLC permutation - `MixedW8A32GruPattern` — weight-only quantized GRU with 4 dequantized params Differential Revision: D105728177
Summary: Both and Cadence now use the shared `QuantFusionPass` from `compiler_funcs.py`. - `QuantFusionPass` in `compiler_funcs.py` iterates patterns, matches `anchor_ops()`, calls `fuse()` on each match, with debug logging and dead code elimination - Cadence: `compiler.py` now uses `QuantFusionPass` instead of the old `QuantFusion` isinstance switch - Removed Cadence `compiler` target's dep on `:fusion_pass` (no longer imported) Differential Revision: D105728219
6e7ff7c to
181c7bd
Compare
Summary:
Both and Cadence now use the shared
QuantFusionPassfromcompiler_funcs.py.QuantFusionPassincompiler_funcs.pyiterates patterns, matchesanchor_ops(), callsfuse()on each match, with debug logging and dead code eliminationcompiler.pynow usesQuantFusionPassinstead of the oldQuantFusionisinstance switchcompilertarget's dep on:fusion_pass(no longer imported)Differential Revision: D105728219