[OMNIML-2244] Implement the ONNX quantization exporter for INT4 #575

ajrasane · 2025-11-18T18:14:53Z

What does this PR do?

Type of change:
New Feature

Overview:

Created an abstract parent class for ONNXQuantExporter
Created child classes for individual precisions
Implemented the INT4QuantExporter
Removed quantize_weights_to_int4
Added a method to quantize weights of the ONNX model to low precision

Testing

python torch_quant_to_onnx.py --quantize_mode=int4_awq \
	--onnx_save_path=<onnx_path> \

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: No

codecov · 2025-11-18T19:52:46Z

Codecov Report

❌ Patch coverage is 23.52941% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.32%. Comparing base (a703e22) to head (83028fa).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/_deploy/utils/torch_onnx.py	18.75%	13 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #575      +/-   ##
==========================================
- Coverage   74.45%   74.32%   -0.13%     
==========================================
  Files         182      182              
  Lines       18250    18164      -86     
==========================================
- Hits        13588    13501      -87     
- Misses       4662     4663       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gcunhase · 2025-11-19T01:36:27Z

If this PR is just for INT4, and NVFP4 and MXFP8 are WIP, can you please update the title accordingly? Thanks!

Signed-off-by: ajrasane <[email protected]>

ajrasane self-assigned this Nov 18, 2025

ajrasane requested review from a team as code owners November 18, 2025 18:14

ajrasane requested a review from i-riyad November 18, 2025 18:14

ajrasane changed the title ~~[OMNIML-2244] Create the ONNX quantization exporter~~ [OMNIML-2244] Implement the ONNX quantization exporter for INT4 Nov 19, 2025

ajrasane requested a review from galagam November 19, 2025 11:03

ajrasane added 3 commits November 20, 2025 20:33

[OMNIML-2244] Create the ONNXQuantExporter

c81ea98

Signed-off-by: ajrasane <[email protected]>

Remove Casts for specific optypes

181ae58

Signed-off-by: ajrasane <[email protected]>

Update tests

83028fa

Signed-off-by: ajrasane <[email protected]>

ajrasane force-pushed the ajrasane/mixed_precision branch from e829998 to 83028fa Compare November 20, 2025 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OMNIML-2244] Implement the ONNX quantization exporter for INT4 #575

[OMNIML-2244] Implement the ONNX quantization exporter for INT4 #575

ajrasane commented Nov 18, 2025

Uh oh!

codecov bot commented Nov 18, 2025 •

edited

Loading

Uh oh!

gcunhase commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[OMNIML-2244] Implement the ONNX quantization exporter for INT4 #575

Are you sure you want to change the base?

[OMNIML-2244] Implement the ONNX quantization exporter for INT4 #575

Conversation

ajrasane commented Nov 18, 2025

What does this PR do?

Testing

Before your PR is "Ready for review"

Uh oh!

codecov bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gcunhase commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Nov 18, 2025 •

edited

Loading