[MetaxGPU][quantize] Use portable C++ MXFP4 dequant on Maca#114
[MetaxGPU][quantize] Use portable C++ MXFP4 dequant on Maca#114ventijing wants to merge 13 commits into
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthrough
ChangesPortable MXFP dequant target detection
Expanded regression testing scope
CI MACA toolkit dependency update
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@regression-perf |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tilelang/quantize/mxfp.py`:
- Around line 174-182: The function _target_uses_portable_mxfp_dequant currently
calls target_is_gfx950() unconditionally on all non-maca targets at the final
return statement. Since target_is_gfx950() is a ROCm-specific predicate and may
fail or reject non-ROCm targets (cuda, llvm, metal, etc.), add a guard check
before the target_is_gfx950() call to verify the target kind is ROCm-related.
Only invoke target_is_gfx950(target) if the target is actually a ROCm target;
otherwise return False to maintain safe failure behavior and prevent errors when
non-ROCm targets are passed from the call site.
- Around line 165-171: In the `_resolve_mxfp_target` function, modify the first
condition that currently checks `if target is not None:` to also exclude the
string "auto" by changing it to `if target is not None and target != "auto":`.
This ensures that when "auto" is passed as the target argument, it is not
returned directly but instead falls through to the `determine_target("auto",
return_object=True)` call on the last line, which properly resolves it to a
Target object rather than passing the unresolved "auto" string downstream.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@maint/scripts/regression_all.py`:
- Line 60: The merge logic using `if k not in merged` silently skips duplicate
benchmark names, which can hide regressions when the expanded examples scope
encounters multiple benchmarks with the same name across different drivers.
Modify the merge operation in the regression_all() function to detect when a key
already exists in the merged dictionary and raise an exception or error with
details about the collision, rather than silently ignoring the duplicate entry.
Alternatively, namespace the keys by including the source file path along with
the benchmark name to ensure uniqueness while preserving all results.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a41f2eaf-ff55-4680-b140-e7218c64cc67
📒 Files selected for processing (1)
maint/scripts/regression_all.py
|
|
||
| def _examples_root() -> Path: | ||
| return Path(__file__).resolve().parents[2] / "examples" / "maca" | ||
| return Path(__file__).resolve().parents[2] / "examples" |
There was a problem hiding this comment.
Expanded examples scope can silently drop benchmark results due to name collisions.
By switching discovery to the repo-wide examples tree (Line 60), regression_all() now has a much higher chance of encountering duplicate benchmark names across different drivers. The current merge logic (if k not in merged) silently ignores later entries, which can hide regressions and undercount totals.
A safer approach is to either fail on duplicate names or namespace keys by file path when merging.
Suggested fix (fail fast on duplicate benchmark names)
@@
- for k, v in parsed.items():
- if k not in merged:
- merged[k] = v
- _RESULTS.append(PerfResult(name=k, latency=v))
+ for k, v in parsed.items():
+ if k in merged:
+ failures.append(
+ f"{rel_path}\nDuplicate benchmark name detected: {k!r}. "
+ "Benchmark names must be globally unique across discovered drivers."
+ )
+ continue
+ merged[k] = v
+ _RESULTS.append(PerfResult(name=k, latency=v))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@maint/scripts/regression_all.py` at line 60, The merge logic using `if k not
in merged` silently skips duplicate benchmark names, which can hide regressions
when the expanded examples scope encounters multiple benchmarks with the same
name across different drivers. Modify the merge operation in the
regression_all() function to detect when a key already exists in the merged
dictionary and raise an exception or error with details about the collision,
rather than silently ignoring the duplicate entry. Alternatively, namespace the
keys by including the source file path along with the benchmark name to ensure
uniqueness while preserving all results.
| f"AMD gfx950 MXFP dequant only supports source_bit=4 and out_dtype=bfloat16, " | ||
| f"Portable MXFP dequant only supports source_bit=4 and out_dtype=bfloat16, " | ||
| f"got source_bit={source_bit}, out_dtype={out_dtype}" | ||
| ) |
There was a problem hiding this comment.
add if branch in here like gfx950 to resolve same issue with maca target.
Maca mxcc rejects PTX inline asm in decode_fp4_to_bf16_twiddling.
Select the existing portable C++ path for Maca and gfx950, and infer
target from determine_target("auto") when not passed explicitly.
Summary by CodeRabbit
Bug Fixes
Chores
examplesdirectory instead of onlyexamples/maca.flash_linear_attention/flash-linear-attentioninstalls (including related metadata) before installing the specified MetaX version.