-
Notifications
You must be signed in to change notification settings - Fork 213
[OMNIML-2244] Add E2E example for mixed precision quantization and ONNX export #656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #656 +/- ##
==========================================
- Coverage 74.50% 74.46% -0.05%
==========================================
Files 183 183
Lines 18400 18415 +15
==========================================
+ Hits 13709 13712 +3
- Misses 4691 4703 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…NX export Signed-off-by: ajrasane <[email protected]>
Signed-off-by: ajrasane <[email protected]>
Signed-off-by: ajrasane <[email protected]>
23e00a2 to
f403387
Compare
|
We should add perf and accuracy numbers for the baseline and quantized models in the README file as well. |
|
A basic query: Is onnx_ptq the right place of "PyTorch PTQ => ONNX export" examples? I was under the impression that onnx_ptq exemplifies PTQ techniques for input ONNX models. |
Right now, |
Signed-off-by: ajrasane <[email protected]>
0211848 to
9758f0c
Compare
|
You'll need to change test from |
9758f0c to
ee0d90f
Compare
| | :--- | :---: | :---: | | ||
| | Torch autocast (FP16) | 85.11% | 97.53% | | ||
| | NVFP4 Quantized | 84.558% | 97.36% | | ||
| | Auto Quantized (FP8 + NVFP4, 4.78 effective bits) | 84.726% | 97.434% | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVFP4 Quantized is NVFP4 + FP16-AutoCast? Same question for Auto Quantized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
gcunhase
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for dividing the examples into 2 folders.
I posted a couple more comments and we'd still need to add the perf numbers to show the accuracy-runtime trade-offs.
Approving for now.
ee0d90f to
11de7ce
Compare
Signed-off-by: ajrasane <[email protected]>
11de7ce to
6c9f7b1
Compare
What does this PR do?
Type of change:
New Feature
Overview:
Usage
Testing
Accuracy results
Reference accuracy for fp16
Before your PR is "Ready for review"