Commit 1aaa77d
authored
[OMNIML-2244] Add support for auto quantizing a model (#571)
## What does this PR do?
**Type of change:**
Example update
**Overview:**
- Added option to quantize a model with `mtq.auto_quantize()`
## Usage
```python
python torch_quant_to_onnx.py \
--timm_model_name vit_small_patch16_224 \
--quantize_mode auto \
--onnx_save_path models/vit_auto_quant.onnx \
--calibration_data_size 512 \
--batch_size 8 \
--auto_quantization_formats NVFP4_AWQ_LITE_CFG FP8_DEFAULT_CFG INT8_DEFAULT_CFG \
--effective_bits 4.8 \
--num_score_steps 128
```
## Testing
Able to auto quantize ViT model
```
AutoQuantize best recipe for patch_embed.proj: NONE(effective-bits: 16.0)
AutoQuantize best recipe for blocks.0.attn.qkv: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.0.attn.proj: FP8_DEFAULT_CFG(effective-bits: 8.0)
AutoQuantize best recipe for blocks.0.mlp.fc1: FP8_DEFAULT_CFG(effective-bits: 8.0)
AutoQuantize best recipe for blocks.0.mlp.fc2: FP8_DEFAULT_CFG(effective-bits: 8.0)
AutoQuantize best recipe for blocks.1.attn.qkv: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.1.attn.proj: FP8_DEFAULT_CFG(effective-bits: 8.0)
AutoQuantize best recipe for blocks.1.mlp.fc1: FP8_DEFAULT_CFG(effective-bits: 8.0)
AutoQuantize best recipe for blocks.1.mlp.fc2: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.2.attn.qkv: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.2.attn.proj: FP8_DEFAULT_CFG(effective-bits: 8.0)
AutoQuantize best recipe for blocks.2.mlp.fc1: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.2.mlp.fc2: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.3.attn.qkv: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.3.attn.proj: FP8_DEFAULT_CFG(effective-bits: 8.0)
AutoQuantize best recipe for blocks.3.mlp.fc1: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.3.mlp.fc2: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.4.attn.qkv: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.4.attn.proj: FP8_DEFAULT_CFG(effective-bits: 8.0)
AutoQuantize best recipe for blocks.4.mlp.fc1: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.4.mlp.fc2: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.5.attn.qkv: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.5.attn.proj: FP8_DEFAULT_CFG(effective-bits: 8.0)
AutoQuantize best recipe for blocks.5.mlp.fc1: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.5.mlp.fc2: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.6.attn.qkv: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.6.attn.proj: FP8_DEFAULT_CFG(effective-bits: 8.0)
AutoQuantize best recipe for blocks.6.mlp.fc1: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.6.mlp.fc2: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.7.attn.qkv: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.7.attn.proj: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.7.mlp.fc1: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.7.mlp.fc2: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.8.attn.qkv: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.8.attn.proj: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.8.mlp.fc1: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.8.mlp.fc2: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.9.attn.qkv: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.9.attn.proj: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.9.mlp.fc1: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.9.mlp.fc2: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.10.attn.qkv: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.10.attn.proj: FP8_DEFAULT_CFG(effective-bits: 8.0)
AutoQuantize best recipe for blocks.10.mlp.fc1: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.10.mlp.fc2: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.11.attn.qkv: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.11.attn.proj: FP8_DEFAULT_CFG(effective-bits: 8.0)
AutoQuantize best recipe for blocks.11.mlp.fc1: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for blocks.11.mlp.fc2: NVFP4_AWQ_LITE_CFG(effective-bits: 4.0)
AutoQuantize best recipe for head: FP8_DEFAULT_CFG(effective-bits: 8.0)
AutoQuantize effective bits from search: 4.80
```
Accuracy comparison for the ViT model
| | Top-1 accuracy | Top-5 accuracy |
|------------------------------------------------------|----------------|----------------|
| Original model (FP32) | 85.102% | 97.526% |
| Auto Quantized (FP8 + NVFP4, 4.78 effective bits) | 84.726% | 97.434%
|
| MXFP8 Quantized | 85.02% | 97.53% |
| NVFP4 Quantized | 84.558% | 97.36% |
| INT4 Quantized | 84.23% | 97.22% |
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: No
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
No <!--- Only for new features, API changes, critical bug fixes or bw
breaking changes. -->
---------
Signed-off-by: ajrasane <[email protected]>1 parent a703e22 commit 1aaa77d
1 file changed
+166
-29
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
29 | | - | |
| 30 | + | |
| 31 | + | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
33 | | - | |
| 35 | + | |
34 | 36 | | |
35 | 37 | | |
36 | 38 | | |
| |||
55 | 57 | | |
56 | 58 | | |
57 | 59 | | |
58 | | - | |
59 | | - | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
60 | 71 | | |
61 | 72 | | |
62 | 73 | | |
| |||
65 | 76 | | |
66 | 77 | | |
67 | 78 | | |
68 | | - | |
69 | | - | |
70 | | - | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
71 | 91 | | |
72 | 92 | | |
73 | 93 | | |
| |||
86 | 106 | | |
87 | 107 | | |
88 | 108 | | |
89 | | - | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
90 | 173 | | |
91 | | - | |
92 | 174 | | |
93 | 175 | | |
94 | | - | |
| 176 | + | |
95 | 177 | | |
96 | 178 | | |
97 | 179 | | |
98 | | - | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
99 | 183 | | |
100 | 184 | | |
101 | 185 | | |
| |||
106 | 190 | | |
107 | 191 | | |
108 | 192 | | |
109 | | - | |
| 193 | + | |
110 | 194 | | |
111 | | - | |
| 195 | + | |
112 | 196 | | |
113 | 197 | | |
114 | 198 | | |
115 | 199 | | |
116 | | - | |
| 200 | + | |
117 | 201 | | |
118 | 202 | | |
119 | 203 | | |
| |||
140 | 224 | | |
141 | 225 | | |
142 | 226 | | |
143 | | - | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
144 | 253 | | |
145 | | - | |
146 | | - | |
| 254 | + | |
147 | 255 | | |
148 | 256 | | |
149 | 257 | | |
150 | 258 | | |
151 | 259 | | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
152 | 264 | | |
153 | 265 | | |
154 | 266 | | |
| |||
159 | 271 | | |
160 | 272 | | |
161 | 273 | | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
168 | 278 | | |
169 | 279 | | |
170 | | - | |
| 280 | + | |
171 | 281 | | |
| 282 | + | |
172 | 283 | | |
173 | | - | |
174 | 284 | | |
175 | | - | |
176 | | - | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
177 | 312 | | |
178 | 313 | | |
179 | 314 | | |
| |||
188 | 323 | | |
189 | 324 | | |
190 | 325 | | |
191 | | - | |
192 | | - | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
193 | 330 | | |
194 | 331 | | |
195 | 332 | | |
| |||
0 commit comments