[Torch FX] Compress PT2E Support #3663

anzr299 · 2025-09-22T14:43:32Z

Changes

Introduced a new API to offer weights compression algorithm for quantizers defined in torch.ao.
Currently only supports OpenVINO Quantizer.

Reason for changes

To support Quantizers defined in torch ao.

Related tickets

169342

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py

…match in signatures in prepare_pt2e.

…_algo

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

src/nncf/experimental/torch/fx/quantization/quantizer/__init__.py

tests/executorch/__init__.py

tests/executorch/test_quantizer.py

tests/executorch/observers.py

daniil-lyakhov

Can I see the PR with OpenVINOQuantizer?

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

daniil-lyakhov · 2025-09-23T15:29:53Z

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

+    ) -> torch.fx.GraphModule:
+        self._quantizer = quantizer


typehints an docstring are missing

src/nncf/quantization/algorithms/weight_compression/algorithm.py

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

...o_export_compression_OpenVINOQuantizer/LlamaDecoderOnly/int4wo_sym_gs32_all_layers_False.dot

src/nncf/experimental/torch/fx/quantization/quantizer/openvino_adapter.py

Co-authored-by: Daniil Lyakhov <[email protected]>

nikita-savelyevv

Huge work, thanks @anzr299!

Mostly minor comments from my side. Overall the updated approach in src/nncf/quantization/algorithms/weight_compression/algorithm.py looks good in my opinion and does not change the logic of the algorithm.

The only significant difference I noticed is that ratio_defining_params are initialized with primary_config from the start, and then some of the parameters are converted back to backup precision after mixed precision algorithm. Before, it was the other way around. It looks a bit cumbersome during mixed precision assignment, but allows to avoid passing group_size_values which is an improvement compared to the previous approach.

nikita-savelyevv · 2025-10-15T12:24:51Z

src/nncf/quantization/algorithms/weight_compression/algorithm.py


    return {
-        "mode": mode,
+        "mode": mode if isinstance(mode, nncf.CompressWeightsMode) else nncf.CompressWeightsMode(mode),


What is the scenario when this is needed, why can't we provide an instance of nncf.CompressWeightsMode instead of a string?

nikita-savelyevv · 2025-11-06T16:05:13Z

src/nncf/quantization/algorithms/weight_compression/algorithm.py


        return ratio_defining_params

+    def _get_backup_config(self, weight_dtype: TensorDataType) -> WeightCompressionConfig:


Suggested change

def _get_backup_config(self, weight_dtype: TensorDataType) -> WeightCompressionConfig:

def _get_backup_config(self, weight_dtype: TensorDataType) -> Optional[WeightCompressionConfig]:

nikita-savelyevv · 2025-11-06T16:06:25Z

src/nncf/quantization/algorithms/weight_compression/algorithm.py

        model: TModel,
        graph: NNCFGraph,
        statistics_points: StatisticPointsContainer,
-        group_size_values: dict[str, int],


Please remove group_size_values from the docstring

nikita-savelyevv · 2025-11-06T16:06:50Z

src/nncf/quantization/algorithms/weight_compression/algorithm.py

+            # ratio_defining_params are all in primary precision. Update parameters
+            # which need to be set to backup precision


Suggested change

# ratio_defining_params are all in primary precision. Update parameters

# which need to be set to backup precision

# At this point ratio_defining_params are all in primary precision. Below we update parameters

# which need to be set to the backup precision.

nikita-savelyevv · 2025-11-06T16:07:25Z

src/nncf/quantization/algorithms/weight_compression/algorithm.py

-
+            # ratio_defining_params are all in primary precision. Update parameters
+            # which need to be set to backup precision
+            for weight_param in ratio_defining_params:


Suggested change

for weight_param in ratio_defining_params:

primary_precision_weight_params = set(primary_precision_weight_params)

for weight_param in ratio_defining_params:

This avoids square complexity

nikita-savelyevv · 2025-11-06T16:16:23Z

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py

+    Applies Weight Compression to the torch.fx.GraphModule provided model
+    using provided torch.ao quantizer.


Suggested change

Applies Weight Compression to the torch.fx.GraphModule provided model

using provided torch.ao quantizer.

Applies Weight Compression to the torch.fx.GraphModule model using provided torch.ao quantizer.

nikita-savelyevv · 2025-11-06T16:16:50Z

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py

+    :param dataset: A representative dataset for the
+        calibration process.


Suggested change

:param dataset: A representative dataset for the

calibration process.

:param dataset: A representative dataset for the calibration process.

nikita-savelyevv · 2025-11-06T16:19:16Z

tests/executorch/test_quantizer_compression.py

+            pt2e_params = PT2E_PARAMS
+            if qparam.get("mode") in {QuantizationMode.INT8WO_ASYM, QuantizationMode.INT8WO_SYM}:
+                pt2e_params = [{}]


Suggested change

pt2e_params = PT2E_PARAMS

if qparam.get("mode") in {QuantizationMode.INT8WO_ASYM, QuantizationMode.INT8WO_SYM}:

pt2e_params = [{}]

pt2e_params = [{}] if qparam.get("mode") in INT8_COMPRESSION_MODES else PT2E_PARAMS

nikita-savelyevv · 2025-11-06T16:19:55Z

tests/executorch/test_quantizer_compression.py

+    ModelCase(LlamaDecoderOnly, "LlamaDecoderOnly", [1, 3, 64]),
+    ModelCase(partial(ShortTransformer, 64, 128, True), "short_transformer_shared", [5]),


Suggested change

ModelCase(LlamaDecoderOnly, "LlamaDecoderOnly", [1, 3, 64]),

ModelCase(partial(ShortTransformer, 64, 128, True), "short_transformer_shared", [5]),

ModelCase(LlamaDecoderOnly, "LlamaDecoderOnly", (1, 3, 64)),

ModelCase(partial(ShortTransformer, 64, 128, True), "short_transformer_shared", (5,)),

nikita-savelyevv · 2025-11-06T16:21:58Z

tests/executorch/test_quantizer_compression.py

+    quantizer_builder: Callable[..., OpenVINOQuantizer],
+    model_case: ModelCase,
+    quantizer_params,
+    pt2e_params,


I see that in this and some other cases below, pt2e_params argument is not used. Is this on purpose? Won't this result in unnecessary duplication of tests?

anzr299 added 3 commits September 22, 2025 17:22

init

190f9d5

fixes

c52fcca

add message for unsupported external quantizers

4e56cb5

anzr299 requested a review from a team as a code owner September 22, 2025 14:43

github-actions bot added the API Public API-impacting changes label Sep 22, 2025

anzr299 marked this pull request as draft September 22, 2025 14:56

daniil-lyakhov self-requested a review September 22, 2025 15:03

daniil-lyakhov reviewed Sep 22, 2025

View reviewed changes

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py Outdated Show resolved Hide resolved

anzr299 added 19 commits September 22, 2025 19:27

add algorithm

9651ceb

impotr openvino quantizer from nncf instead of executorch

14daeb5

Add observers and openvino quantizer to nncf

3746815

fix

0815dc5

minor fix

1b8d940

fix

7d35374

fix some more bugs; observers was importing from torchao. causing mis…

427ebc2

…match in signatures in prepare_pt2e.

add compress pt2e to init

24dbfb6

fix quantizer init file. Remove extra code.

4bb8c1a

small fix for the big problem:)

8902842

fix quantizer preset definition

3842538

fix openvino quantizer for ptq. call _algo instead of legacy _min_max…

2e70c2e

…_algo

fix quantizer defaults

b1c9aad

microfix

33fe01c

precommit fix

d8e1006

revert openvino quantizer to old

88a8472

create ovquantizer in executorch dir

7a8e51a

update executorch quantizer location.

fed5052

check if openvino quantizer has weight compression in openvino adapter

2866473

daniil-lyakhov requested changes Sep 23, 2025

View reviewed changes

daniil-lyakhov reviewed Sep 23, 2025

View reviewed changes

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py Outdated Show resolved Hide resolved

anzr299 added 3 commits October 15, 2025 17:38

minor fix

93c3f19

update workflow for fix

932b296

update workflow file

67ab135

daniil-lyakhov reviewed Oct 15, 2025

View reviewed changes

...o_export_compression_OpenVINOQuantizer/LlamaDecoderOnly/int4wo_sym_gs32_all_layers_False.dot Show resolved Hide resolved

anzr299 added 20 commits October 15, 2025 19:05

install executorch after pytorch

a23acaf

install torch nightly

6462284

update requirements and revert workflow changes

cf7e8d3

fix minor workflow file issue

a07dc07

install with no build isolation

0506bca

include executorch requirements

f8675ad

include openvino in requirements

52a7d5a

fix

9e02948

fix

a578fce

update requirements

8ae6a80

add conftest and __init__

c7210b8

use older pytorch commit

2f8b296

change torch versions to 2.10.0.dev20250922+cpu

75ccdcb

install executorch directly from requirements txt

75cc255

comments

3cdfe74

seperate executorch installation

e4f9286

precommit fix

009c587

conftest precommit

6e379c8

update ref location for executorch

e45f796

define ratio in compress_pt2e API and not Quantizer itself; Update test

f2ece8c

daniil-lyakhov reviewed Oct 16, 2025

View reviewed changes

src/nncf/experimental/torch/fx/quantization/quantizer/openvino_adapter.py Outdated Show resolved Hide resolved

Apply suggestion from @daniil-lyakhov

387d69c

Co-authored-by: Daniil Lyakhov <[email protected]>

daniil-lyakhov self-requested a review November 5, 2025 12:15

anzr299 added 2 commits November 5, 2025 18:07

Merge branch 'openvinotoolkit:develop' into an/fx/compress_pt2e

4ace0df

pre-commit fix

00c8897

nikita-savelyevv reviewed Nov 6, 2025

View reviewed changes


		return ratio_defining_params

		def _get_backup_config(self, weight_dtype: TensorDataType) -> WeightCompressionConfig:

		# ratio_defining_params are all in primary precision. Update parameters
		# which need to be set to backup precision

	for weight_param in ratio_defining_params:
	primary_precision_weight_params = set(primary_precision_weight_params)
	for weight_param in ratio_defining_params:

		Applies Weight Compression to the torch.fx.GraphModule provided model
		using provided torch.ao quantizer.

		:param dataset: A representative dataset for the
		calibration process.

		ModelCase(LlamaDecoderOnly, "LlamaDecoderOnly", [1, 3, 64]),
		ModelCase(partial(ShortTransformer, 64, 128, True), "short_transformer_shared", [5]),

[Torch FX] Compress PT2E Support #3663

Are you sure you want to change the base?

[Torch FX] Compress PT2E Support #3663

Uh oh!

Conversation

anzr299 commented Sep 22, 2025

Changes

Reason for changes

Related tickets

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daniil-lyakhov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikita-savelyevv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants