[2/N] Added KDLoss based AutoQuantize #592

realAsma · 2025-11-20T23:11:55Z

What does this PR do?

Type of change: ? New Feature

Overview:

This PR extends AutoQuantize with KL Divergence Loss-based sensitivity measurement as an alternative to the existing gradient-based approach. KD Loss uses a binary searcher similar to the binary searcher in FastNAS.

AutoQuantize gradient is faster than KL Divergence based AutoQuantize. However KL Divergence does not need the model implementation to support gradient backward. In addition, AutoQuantize collected KL Divergence is useful for sensitivity analysis of the model. KL Divergence is a more direct measure of sensitivity than gradient scores.

Usage

see tests/unit/torch/quantization/test_autoquant.py

Testing

Testes with unit tests.

Result for Qwen3 8B

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

copy-pr-bot · 2025-11-20T23:11:58Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

modelopt/torch/quantization/algorithms.py

Signed-off-by: Asma Kuriparambil Thekkumpate <[email protected]> minor Signed-off-by: Asma Kuriparambil Thekkumpate <[email protected]> cheery-picked final PR changes changelog updates Signed-off-by: realAsma <[email protected]> minor Signed-off-by: realAsma <[email protected]> KL Div formula fix Signed-off-by: realAsma <[email protected]>

Some improvements for KLDiv Signed-off-by: realAsma <[email protected]> changelog update Signed-off-by: realAsma <[email protected]> minor Signed-off-by: realAsma <[email protected]> doc updates Signed-off-by: realAsma <[email protected]>

…AutoQuantizeGradientSearcher; seperated quant modules and score modules (#586) ## What does this PR do? **Type of change:** Refator; Minor new feature **Overview:** ? 1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher - Prepares architecture for additional search methods. 2. seperated quant modules and score modules - separate quantization modules from scoring modules, enabling auto-quantization to measure sensitivity at parent layers (e.g., MLP output for MoE experts) rather than individual ops. 3. Also see #592 and #588 ## Testing See unittests; `tests/unit/torch/quantization/test_autoquant.py` and `tests/unit/torch/quantization/plugins/test_huggingface.py` ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Not Required ## Additional Information   ## Summary by CodeRabbit * **New Features** * Added support for score modules in quantization workflows. * Added optional naming for quantization recipes. * **Bug Fixes** * Improved quantization grouping rules documentation with clearer configuration examples. * **Refactor** * Renamed quantization module parameters for improved clarity. * Enhanced quantization search architecture for better scalability. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>  --------- Signed-off-by: realAsma <[email protected]> Co-authored-by: Asma Kuriparambil Thekkumpate <[email protected]>

realAsma requested review from a team as code owners November 20, 2025 23:11

realAsma requested review from Edwardf0t1 and ajrasane and removed request for a team November 20, 2025 23:11

realAsma requested review from Fridah-nv, cjluo-nv, kevalmorabia97, kinjalpatel27 and mxinO November 20, 2025 23:18

realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from 197d4d6 to 9134ca9 Compare November 20, 2025 23:39

realAsma force-pushed the asma/auto_quantize_improvements branch from 9ebd69f to b7bd107 Compare November 21, 2025 00:21

realAsma requested a review from a team as a code owner November 21, 2025 00:21

realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from dc15dae to 48b0423 Compare November 21, 2025 00:33

realAsma force-pushed the asma/auto_quantize_improvements branch 3 times, most recently from 60a0f26 to 0275c61 Compare November 21, 2025 17:56

meenchen reviewed Nov 21, 2025

View reviewed changes

modelopt/torch/quantization/algorithms.py Show resolved Hide resolved

realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from 48b0423 to 73fc080 Compare November 21, 2025 21:22

realAsma requested a review from meenchen November 21, 2025 21:44

meenchen reviewed Nov 24, 2025

View reviewed changes

modelopt/torch/quantization/algorithms.py Outdated Show resolved Hide resolved

modelopt/torch/quantization/algorithms.py Show resolved Hide resolved

modelopt/torch/quantization/algorithms.py Show resolved Hide resolved

modelopt/torch/quantization/algorithms.py Show resolved Hide resolved

realAsma requested a review from meenchen November 25, 2025 00:53

realAsma commented Nov 25, 2025

View reviewed changes

modelopt/torch/quantization/algorithms.py Show resolved Hide resolved

realAsma force-pushed the asma/auto_quantize_improvements branch 2 times, most recently from 6405f2b to d08a403 Compare November 25, 2025 20:21

realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from 46670e1 to 09d8a29 Compare November 25, 2025 20:24

meenchen approved these changes Nov 25, 2025

View reviewed changes

realAsma force-pushed the asma/auto_quantize_improvements branch from d08a403 to 2d8ad4d Compare November 25, 2025 21:44

realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from 09d8a29 to 75f83da Compare November 25, 2025 22:00

realAsma force-pushed the asma/auto_quantize_improvements branch from 6ab013e to 4b72089 Compare November 25, 2025 23:17

realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from 75f83da to 1b52477 Compare November 25, 2025 23:18

realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from 6e3ad6f to 0aada4e Compare November 25, 2025 23:34

realAsma merged commit 6467ec2 into asma/auto_quantize_improvements Nov 25, 2025
1 check passed

realAsma deleted the asma/auto_quantize_kd_loss_sensitivity branch November 25, 2025 23:35

realAsma mentioned this pull request Nov 25, 2025

[1/N] Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher; seperated quant modules and score modules #586

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[2/N] Added KDLoss based AutoQuantize #592

[2/N] Added KDLoss based AutoQuantize #592

realAsma commented Nov 20, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Nov 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[2/N] Added KDLoss based AutoQuantize #592

[2/N] Added KDLoss based AutoQuantize #592

Conversation

realAsma commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Nov 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

realAsma commented Nov 20, 2025 •

edited

Loading