Skip to content

Conversation

@anzr299
Copy link
Collaborator

@anzr299 anzr299 commented Sep 22, 2025

Changes

Introduced a new API to offer weights compression algorithm for quantizers defined in torch.ao.
Currently only supports OpenVINO Quantizer.

Reason for changes

To support Quantizers defined in torch ao.

Related tickets

169342

@anzr299 anzr299 requested a review from a team as a code owner September 22, 2025 14:43
@github-actions github-actions bot added the API Public API-impacting changes label Sep 22, 2025
@anzr299 anzr299 marked this pull request as draft September 22, 2025 14:56
@daniil-lyakhov daniil-lyakhov self-requested a review September 22, 2025 15:03
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I see the PR with OpenVINOQuantizer?

Comment on lines 34 to 35
) -> torch.fx.GraphModule:
self._quantizer = quantizer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typehints an docstring are missing

@daniil-lyakhov daniil-lyakhov self-requested a review November 5, 2025 12:15
Copy link
Collaborator

@nikita-savelyevv nikita-savelyevv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge work, thanks @anzr299!

Mostly minor comments from my side. Overall the updated approach in src/nncf/quantization/algorithms/weight_compression/algorithm.py looks good in my opinion and does not change the logic of the algorithm.

The only significant difference I noticed is that ratio_defining_params are initialized with primary_config from the start, and then some of the parameters are converted back to backup precision after mixed precision algorithm. Before, it was the other way around. It looks a bit cumbersome during mixed precision assignment, but allows to avoid passing group_size_values which is an improvement compared to the previous approach.


return {
"mode": mode,
"mode": mode if isinstance(mode, nncf.CompressWeightsMode) else nncf.CompressWeightsMode(mode),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the scenario when this is needed, why can't we provide an instance of nncf.CompressWeightsMode instead of a string?


return ratio_defining_params

def _get_backup_config(self, weight_dtype: TensorDataType) -> WeightCompressionConfig:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _get_backup_config(self, weight_dtype: TensorDataType) -> WeightCompressionConfig:
def _get_backup_config(self, weight_dtype: TensorDataType) -> Optional[WeightCompressionConfig]:

model: TModel,
graph: NNCFGraph,
statistics_points: StatisticPointsContainer,
group_size_values: dict[str, int],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove group_size_values from the docstring

Comment on lines +558 to +559
# ratio_defining_params are all in primary precision. Update parameters
# which need to be set to backup precision
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# ratio_defining_params are all in primary precision. Update parameters
# which need to be set to backup precision
# At this point ratio_defining_params are all in primary precision. Below we update parameters
# which need to be set to the backup precision.


# ratio_defining_params are all in primary precision. Update parameters
# which need to be set to backup precision
for weight_param in ratio_defining_params:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for weight_param in ratio_defining_params:
primary_precision_weight_params = set(primary_precision_weight_params)
for weight_param in ratio_defining_params:

This avoids square complexity

Comment on lines +181 to +182
Applies Weight Compression to the torch.fx.GraphModule provided model
using provided torch.ao quantizer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Applies Weight Compression to the torch.fx.GraphModule provided model
using provided torch.ao quantizer.
Applies Weight Compression to the torch.fx.GraphModule model using provided torch.ao quantizer.

Comment on lines +187 to +188
:param dataset: A representative dataset for the
calibration process.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param dataset: A representative dataset for the
calibration process.
:param dataset: A representative dataset for the calibration process.

Comment on lines +133 to +135
pt2e_params = PT2E_PARAMS
if qparam.get("mode") in {QuantizationMode.INT8WO_ASYM, QuantizationMode.INT8WO_SYM}:
pt2e_params = [{}]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pt2e_params = PT2E_PARAMS
if qparam.get("mode") in {QuantizationMode.INT8WO_ASYM, QuantizationMode.INT8WO_SYM}:
pt2e_params = [{}]
pt2e_params = [{}] if qparam.get("mode") in INT8_COMPRESSION_MODES else PT2E_PARAMS

Comment on lines +148 to +149
ModelCase(LlamaDecoderOnly, "LlamaDecoderOnly", [1, 3, 64]),
ModelCase(partial(ShortTransformer, 64, 128, True), "short_transformer_shared", [5]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ModelCase(LlamaDecoderOnly, "LlamaDecoderOnly", [1, 3, 64]),
ModelCase(partial(ShortTransformer, 64, 128, True), "short_transformer_shared", [5]),
ModelCase(LlamaDecoderOnly, "LlamaDecoderOnly", (1, 3, 64)),
ModelCase(partial(ShortTransformer, 64, 128, True), "short_transformer_shared", (5,)),

quantizer_builder: Callable[..., OpenVINOQuantizer],
model_case: ModelCase,
quantizer_params,
pt2e_params,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that in this and some other cases below, pt2e_params argument is not used. Is this on purpose? Won't this result in unnecessary duplication of tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Public API-impacting changes NNCF PT Pull requests that updates NNCF PyTorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants