-
Notifications
You must be signed in to change notification settings - Fork 755
[PyTorch][NVFP4][MOE] NVFP4 Grouped Quantize with Hadamard Transform #2411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
timmoon10
merged 38 commits into
NVIDIA:main
from
zhongbozhu:zhongbo/multi_rht_cast_colwise_fuse
Dec 20, 2025
Merged
Changes from all commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
d0dbe66
rowwise colwise RHT group quant v1
zhongbozhu b345534
remove local array RW
zhongbozhu 2eb23b3
change wait_barrier
zhongbozhu 004e529
fast math options
zhongbozhu d9a6c24
use mult to replace div
zhongbozhu 9b9efb8
format
zhongbozhu a9d0fc5
bulk move random states
zhongbozhu 1af82af
greptile
zhongbozhu b4515d2
lint
zhongbozhu 626e3fe
revert to use divides
zhongbozhu fc6f7f2
avoid fp32 bf16 round-trip in RHT cast fusion
zhongbozhu 48e5d75
trigger fastmath by toggle NVTE_RHT_CAST_FUSION_USE_FAST_MATH
zhongbozhu 3d07a9b
integrate row col rht fusion, functional
zhongbozhu 70523c8
numerics aligned
zhongbozhu 0388466
style
zhongbozhu 27f1047
remove device sync
zhongbozhu 380a116
128 padding
zhongbozhu f61979a
revert colwise rng state creation because of row-col fused kernel
zhongbozhu 6f38c78
fix CI, linter
zhongbozhu badcf74
refactor RS for generating two random values
zhongbozhu 0d245ae
Avoid invalid configs with templated kernel
timmoon10 83e7bf2
fix acc pipeline init with 0 arrival count
zhongbozhu b554bef
restore rowwise-only mode
zhongbozhu 247a20b
switch to dynamic atomic scheduler
zhongbozhu 4df34ce
Avoid instantiating group RHT+cast kernel without row-wise or col-wis…
timmoon10 cbdda20
Include fast math option in quantization config
timmoon10 0ac4d74
Fix linter warnings and review nits
timmoon10 d98b732
Merge branch 'main' into zhongbo/multi_rht_cast_colwise_fuse
timmoon10 c14b156
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 40ae64c
Use TE license
timmoon10 15e1edb
Fix bug where kernel is always launched on stream
timmoon10 79cc660
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8534c38
Restore BF16 intermediate downcast in fused RHT-cast kernels
timmoon10 57db30f
fix numerical test of grouped kernel
zhongbozhu b258ca9
Make sure row-wise and col-wise quantization use different RNG seeds
timmoon10 d79c2ac
Merge branch 'main' into zhongbo/multi_rht_cast_colwise_fuse
timmoon10 66ac756
Restore autoformatter
timmoon10 376687c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're copy-pasting the templated infrastructure for single-tensor quantization, and inheriting all of its complexity to handle unnecessary features.