Skip to content

Conversation

@ajrasane
Copy link
Contributor

What does this PR do?

Type of change:
New Feature

Overview:

  • Created an abstract parent class for ONNXQuantExporter
  • Created child classes for individual precisions
  • Implemented the INT4QuantExporter
  • Removed quantize_weights_to_int4
  • Added a method to quantize weights of the ONNX model to low precision

Testing

python torch_quant_to_onnx.py --quantize_mode=int4_awq \
	--onnx_save_path=<onnx_path> \

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: No

@ajrasane ajrasane self-assigned this Nov 18, 2025
@ajrasane ajrasane requested review from a team as code owners November 18, 2025 18:14
@ajrasane ajrasane requested a review from i-riyad November 18, 2025 18:14
@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

❌ Patch coverage is 23.52941% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.32%. Comparing base (a703e22) to head (83028fa).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/_deploy/utils/torch_onnx.py 18.75% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #575      +/-   ##
==========================================
- Coverage   74.45%   74.32%   -0.13%     
==========================================
  Files         182      182              
  Lines       18250    18164      -86     
==========================================
- Hits        13588    13501      -87     
- Misses       4662     4663       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gcunhase
Copy link
Contributor

If this PR is just for INT4, and NVFP4 and MXFP8 are WIP, can you please update the title accordingly? Thanks!

@ajrasane ajrasane changed the title [OMNIML-2244] Create the ONNX quantization exporter [OMNIML-2244] Implement the ONNX quantization exporter for INT4 Nov 19, 2025
@ajrasane ajrasane requested a review from galagam November 19, 2025 11:03
@ajrasane ajrasane force-pushed the ajrasane/mixed_precision branch from e829998 to 83028fa Compare November 20, 2025 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants