Make low-precision MACA examples use portable emitters, launch shapes, and reference paths#34
Open
VitalyAnkh wants to merge 4 commits into
Open
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
26e7fbd to
9c526c9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #28.
Problem
Several low-precision MACA examples still fail after the backend is repaired because they assume CUDA or Hopper-specific emitters, launch shapes, and reference paths.
What this PR changes
Solution
The examples now make target-aware choices at the example layer. Where CUDA-only APIs or Hopper-tuned shapes were previously assumed, the MACA path now chooses a compatible emitter, a smaller launch configuration, or a portable reference implementation. This keeps the examples executable without changing their intended numerical checks.
Alternatives considered
A narrower alternative was to skip the affected examples on MACA and retain the existing CUDA-oriented logic. That would have preserved the xfails rather than solving them. Another possibility was to push all of the adaptation down into backend lowering. That would not help with example-side reference code that depends on APIs absent from the MACA runtime.
Verification
python -m pytest -q examples/maca/gemm_fp8/test_example_gemm_fp8.pypython -m pytest -q examples/maca/deepseek_deepgemm/test_example_deepgemm_fp8_2xAcc.pypython -m pytest -q examples/maca/dequantize_gemm/test_example_dequantize_gemm.pyStack context
This PR builds on #33.