Commit 58aed0e
Qualcomm: cap inf replacement value to fix 16a16w accuracy regression
PR #19660 folded ReplaceInfValues into QnnQuantizer._replace_inf and made
the inf stand-in equal to the full quant range. For 16a16w that is 65535
(vs the previous fixed 255), which blows up the attention-mask quant scale
and breaks stories110M decoding in test-llama-runner-qnn-linux. Cap the
magnitude at 255 to restore prior behavior; 8a8w is unaffected.
Co-authored-by: Claude <noreply@anthropic.com>1 parent aada6d7 commit 58aed0e
1 file changed
Lines changed: 6 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
416 | 416 | | |
417 | 417 | | |
418 | 418 | | |
419 | | - | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
420 | 425 | | |
421 | 426 | | |
422 | 427 | | |
| |||
0 commit comments