-
Notifications
You must be signed in to change notification settings - Fork 23
[feat] Add flydsl based grouped gemm #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 12 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
fea6673
flydsl grouped fp8 GEMM: M-grouped fwd(NT)/dgrad(NN) + variable-K wgr…
kyle-256 e7541a3
flydsl grouped fp8: wire backend into grouped GEMM dispatch
kyle-256 637cfa5
flydsl grouped fp8: merge per-layout persistent/non-persistent NT & N…
kyle-256 0fee9d4
flydsl grouped fp8: drop persistent wgrad, keep only the masked chunk…
kyle-256 b1d3ab6
flydsl grouped fp8: int64 indexing for inputs + output (handles A/B/C…
kyle-256 f7975b7
flydsl dense fp8: int64 inputs (foldable + <=4GB traversal) + mode-sp…
kyle-256 e3e44b0
flydsl grouped fp8 bwd: small-M dgrad/wgrad autotune + skew-robust wgrad
kyle-256 e0cebdd
flydsl grouped wgrad: share masked/persistent tile decode + i64 rebase
kyle-256 bfd4510
rename fp8_gemm_helper -> gemm_helper + rebase main
2ab4072
test(grouped_gemm): add FlyDSL backend coverage to tensorwise tests
0c59ff4
fix(flydsl grouped): nt_vmcnt=-1 causes data hazard in non-persistent…
kyle-256 ccb450e
fix(flydsl utils): restore missing helpers + fix output store bugs
kyle-256 d21b6ed
feat(flydsl gemm): i64 traversal SRD re-base to lift the k*n / k*m < …
kyle-256 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.