Improve accuracy for models using shuffle, unshuffle, cat ops (#19159) by abhinaykukkadapu · Pull Request #19159 · pytorch/executorch

abhinaykukkadapu · 2026-04-27T17:48:22Z

Summary:

Replace the Qualcomm concat observer path with an explicit same-domain-or-requantize model for aten.cat. Preserve shared qparams for pixel_shuffle and pixel_unshuffle, extend split_with_sizes_copy coverage, and add regressions for mismatched cat branches plus value-preserving ops that must use SharedQuantizationSpec.

Differential Revision: D102626539

pytorch-bot · 2026-04-27T17:48:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19159

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 4 Unrelated Failures

As of commit e3284e0 with merge base cdcc915 ():

NEW FAILURES - The following jobs have failed:

pull / android / run-emulator (gh)
The process '/usr/bin/sh' failed with exit code 1
pull / unittest-arm-backend-with-no-deps (test_pytest_ops_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t 2824d2ddfe7ac25345d7ea97f3debae3a30a49394c11a29146fa9880503aeec5 /exec failed with exit code 1

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / unittest / windows / windows-job (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 8
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 8

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-27T17:48:58Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-codesync · 2026-04-27T19:48:19Z

@abhinaykukkadapu has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102626539.

…h#19159) Summary: Replace the Qualcomm concat observer path with an explicit same-domain-or-requantize model for `aten.cat`. Preserve shared qparams for `pixel_shuffle` and `pixel_unshuffle`, extend `split_with_sizes_copy` coverage, and add regressions for mismatched `cat` branches plus value-preserving ops that must use `SharedQuantizationSpec`. Differential Revision: D102626539

winskuo-quic

Hi @abhinaykukkadapu,
Thanks a lot for this PR and making all changes to improve QNN Backend accuracy.
I think shuffle & unshuffle LGTM.
However, I think there might be some concerns for the concat operation modification.
I have a PR to reproduce the concerns I got: #19182. I also have the outputs and graphs compared in the summary section.
Please have a look.

If change for concat behavior is required to fix accuracy issues for the model you mentioned in #19159, could we possibly reuse custom_annotation feature under https://github.com/pytorch/executorch/blob/main/backends/qualcomm/quantizer/custom_annotation.py for that model.

Thanks

winskuo-quic · 2026-04-28T06:52:13Z

-            qscheme=quantization_config.output_activation.qscheme,
-            quant_max=quantization_config.output_activation.quant_max,
-            quant_min=quantization_config.output_activation.quant_min,
-            observer_or_fake_quant_ctr=ConcatObserver.with_args(


I believe this change it reverting what this PR is doing: #15162.
The reason #15162 is introduced is because the input[0] could not cover the entire range of values for concat output, so a lot of output values were clipped.

If you have 2 input tensors like:
sample_input = ( torch.tensor([[[[-10.0, 2.0], [3.0, 4.0]]]]), torch.tensor([[[[1.0, 3.0], [8.0, 10.0]]]]), )
and after it goes through cat operation, you will be getting the wrong value with this PR.
[tensor([[[[-9.9798, 1.9849], [ 2.9774, 4.0250], [ 1.0476, 3.0325], [ 4.0802, 4.0802]]]])]
I have a demo PR to reproduce this error, please have a look:
#19182

Thanks @winskuo-quic for the detailed review, i agree that it might've worked for the model but might not work when the ranges skewed like in your example. Let me revert the cat to concatobserver and test the accuracy.

winskuo-quic · 2026-04-28T06:52:59Z

        # node1 -> q_ui8 (n) -> dq_ui8 -> q_int32 -> dq_int32 -> node2 -> ....
        # We store {node2: quant_attr in dq_int32} in node1.meta
        if n.target in q_ops and n.args[0].target not in dq_ops:
+            self._annotate_cat_requant(n)


I think adding a specific op in a general pass would not be the best option. Do you think we can possibly move the logic to htp_rules.py if this is necessary?

Yeah, i was also going back and forth on this, i added a pass instead of shoving it up here.

…h#19159) Summary: Replace the Qualcomm concat observer path with an explicit same-domain-or-requantize model for `aten.cat`. Preserve shared qparams for `pixel_shuffle` and `pixel_unshuffle`, extend `split_with_sizes_copy` coverage, and add regressions for mismatched `cat` branches plus value-preserving ops that must use `SharedQuantizationSpec`. Differential Revision: D102626539

abhinaykukkadapu · 2026-04-29T16:37:36Z

@winskuo-quic here is the latest patch, it uses the same ConcatObserver approach as previously done. Additionally the pass for requant is introduced after the generic AnnotateQuantAttr pass. Please review when you get a chance.

FYI the SQNR right now stayed above target with this changeset. (> 30db)

abhinaykukkadapu · 2026-05-04T18:20:07Z

@winskuo-quic @shewu-quic bumping this up in your queue.

winskuo-quic · 2026-05-05T02:16:54Z

@winskuo-quic @shewu-quic bumping this up in your queue.

@abhinaykukkadapu Could you share more about how to see the difference with the repro script? #19179
An example script will be appreciated.

I tried running the script with this PR but I am getting the same numbers both before and after disabling AnnotateConcatRequant

abhinaykukkadapu · 2026-05-05T16:27:41Z

@winskuo-quic @shewu-quic bumping this up in your queue.

@abhinaykukkadapu Could you share more about how to see the difference with the repro script? #19179 An example script will be appreciated.

I tried running the script with this PR but I am getting the same numbers both before and after disabling AnnotateConcatRequant

This PR is unrelated to the repro script, as i mentioned in the email, that is for an issue i couldn't solve outside QNN, it is to reproduce an issue that happened to those specific sequence of ops, see the quant_vs_runtime_sqnr

What this PR actually does is to fix the wrong annotation for shuffle, unshuffle and cat ops requant (we need requant even though we use observer which is missing). Lmk if you need more info.

winskuo-quic · 2026-05-06T06:00:41Z

This PR is unrelated to the repro script, as i mentioned in the email, that is for an issue i couldn't solve outside QNN, it is to reproduce an issue that happened to those specific sequence of ops, see the quant_vs_runtime_sqnr

What this PR actually does is to fix the wrong annotation for shuffle, unshuffle and cat ops requant (we need requant even though we use observer which is missing). Lmk if you need more info.

Sorry I think I misunderstood your comments above where you mentioned SQNR got an increase with the Concat fix, so I thought updating quant configs for Concat can improve: #19179

I think I am still unable to see the difference with and without AnnotateConcatRequant. I tried running test_qnn_backend_cat_uses_concat_observer_output_qspec and still can't see difference in final graph.
Could you please provide a simple test or the model you are enabling that can show the difference before and after enabling AnnotateConcatRequant, e.g. without pass -> accuracy fail and with pass -> accuracy pass. This would also serve as a UT to show that AnnotateConcatRequant is necessary.
Thanks

…h#19159) Summary: Replace the Qualcomm concat observer path with an explicit same-domain-or-requantize model for `aten.cat`. Preserve shared qparams for `pixel_shuffle` and `pixel_unshuffle`, extend `split_with_sizes_copy` coverage, and add regressions for mismatched `cat` branches plus value-preserving ops that must use `SharedQuantizationSpec`. Differential Revision: D102626539

abhinaykukkadapu · 2026-05-06T18:14:15Z

This PR is unrelated to the repro script, as i mentioned in the email, that is for an issue i couldn't solve outside QNN, it is to reproduce an issue that happened to those specific sequence of ops, see the quant_vs_runtime_sqnr
What this PR actually does is to fix the wrong annotation for shuffle, unshuffle and cat ops requant (we need requant even though we use observer which is missing). Lmk if you need more info.

Sorry I think I misunderstood your comments above where you mentioned SQNR got an increase with the Concat fix, so I thought updating quant configs for Concat can improve: #19179

I think I am still unable to see the difference with and without AnnotateConcatRequant. I tried running test_qnn_backend_cat_uses_concat_observer_output_qspec and still can't see difference in final graph. Could you please provide a simple test or the model you are enabling that can show the difference before and after enabling AnnotateConcatRequant, e.g. without pass -> accuracy fail and with pass -> accuracy pass. This would also serve as a UT to show that AnnotateConcatRequant is necessary. Thanks

Thanks for the clarification and sorry for the confusion. The SQNR comment i made was the test i made with the real DSR model. TLDR is that the requant is needed for cat ops when there is a shared spec in one of the input branches, in that case annotation itself is not sufficient and we need to requant after convert_pt2e and is done by the pass.

Added unit tests to simulate the low and high accuracy with and without the pass.

…h#19159) Summary: Replace the Qualcomm concat observer path with an explicit same-domain-or-requantize model for `aten.cat`. Preserve shared qparams for `pixel_shuffle` and `pixel_unshuffle`, extend `split_with_sizes_copy` coverage, and add regressions for mismatched `cat` branches plus value-preserving ops that must use `SharedQuantizationSpec`. Differential Revision: D102626539

winskuo-quic · 2026-05-07T08:05:56Z

Hi @abhinaykukkadapu,
Thanks for all the changes. I have checked the PR UT but still observer both cases (with and without annotate_concat_requant passes) generates incorrect output compared to nn.Module FP32.
Reference:

I think shuffle and unshuffle looks good to me. Do you think we should merge these 2 first and maybe we can take a look at concat quantization to see if there is a more general approach to cover all values?
Thanks

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 27, 2026

abhinaykukkadapu requested review from haowhsu-quic, shewu-quic and winskuo-quic April 27, 2026 17:59

abhinaykukkadapu force-pushed the export-D102626539 branch from 7afba6c to 295756a Compare April 27, 2026 19:44

meta-codesync Bot added fb-exported meta-exported labels Apr 27, 2026

abhinaykukkadapu force-pushed the export-D102626539 branch from 295756a to 541a852 Compare April 27, 2026 19:48

meta-codesync Bot changed the title ~~Improve accuracy for models using shuffle, unshuffle, cat ops~~ Improve accuracy for models using shuffle, unshuffle, cat ops (#19159) Apr 27, 2026

abhinaykukkadapu force-pushed the export-D102626539 branch from 541a852 to c6c5d0b Compare April 27, 2026 20:01

abhinaykukkadapu linked an issue Apr 28, 2026 that may be closed by this pull request

Accuracy issue in QCOM backend for DSR models #19176

Open

winskuo-quic reviewed Apr 28, 2026

View reviewed changes

abhinaykukkadapu force-pushed the export-D102626539 branch 2 times, most recently from 779146e to 7fdec04 Compare April 29, 2026 16:36

abhinaykukkadapu added this to ExecuTorch Core Apr 29, 2026

github-project-automation Bot moved this to To triage in ExecuTorch Core Apr 29, 2026

abhinaykukkadapu moved this from To triage to In progress in ExecuTorch Core Apr 29, 2026

abhinaykukkadapu self-assigned this Apr 30, 2026

abhinaykukkadapu force-pushed the export-D102626539 branch from 7fdec04 to bec8440 Compare May 6, 2026 18:12

abhinaykukkadapu force-pushed the export-D102626539 branch from bec8440 to e3284e0 Compare May 6, 2026 18:16

Conversation

abhinaykukkadapu commented Apr 27, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19159

❌ 2 New Failures, 4 Unrelated Failures

Uh oh!

github-actions Bot commented Apr 27, 2026

This PR needs a release notes: label

Uh oh!

meta-codesync Bot commented Apr 27, 2026

Uh oh!

winskuo-quic left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

winskuo-quic Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

abhinaykukkadapu Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

winskuo-quic Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

abhinaykukkadapu Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

abhinaykukkadapu commented Apr 29, 2026

Uh oh!

abhinaykukkadapu commented May 4, 2026

Uh oh!

winskuo-quic commented May 5, 2026

Uh oh!

abhinaykukkadapu commented May 5, 2026

Uh oh!

winskuo-quic commented May 6, 2026

Uh oh!

abhinaykukkadapu commented May 6, 2026

Uh oh!

winskuo-quic commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abhinaykukkadapu commented Apr 27, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented Apr 27, 2026 •

edited

Loading

This PR needs a `release notes:` label

winskuo-quic left a comment •

edited

Loading