[MetaxGPU][quantize] Use portable C++ MXFP4 dequant on Maca by ventijing · Pull Request #114 · tile-ai/tilelang-metax

ventijing · 2026-06-22T06:08:09Z

Maca mxcc rejects PTX inline asm in decode_fp4_to_bf16_twiddling.
Select the existing portable C++ path for Maca and gfx950, and infer
target from determine_target("auto") when not passed explicitly.

Summary by CodeRabbit

Bug Fixes
- Improved MXFP dequantization compatibility by automatically choosing a portable dequantization path on targets that can’t run CUDA PTX inline assembly, fixing correctness for affected AMD/MACA configurations.
- Refined target selection so the MXFP implementation matches the active hardware capability set.
Chores
- Updated regression discovery to run benchmark driver scripts from the top-level examples directory instead of only examples/maca.
- Improved MACA CI setup by fully removing any existing flash_linear_attention/flash-linear-attention installs (including related metadata) before installing the specified MetaX version.

github-actions · 2026-06-22T06:08:20Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-06-22T06:08:24Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

tilelang/quantize/mxfp.py adds two helpers—_resolve_mxfp_target() and _target_uses_portable_mxfp_dequant()—and uses them to replace the ad-hoc AMD try/except detection in get_mxfp_intrin_group() with a generalized portable-path switch covering both maca and gfx950 targets. Imports are expanded accordingly. In parallel, maint/scripts/regression_all.py broadens regression test discovery to scan the full examples directory instead of only the maca subdirectory. The CI workflow fully resets and updates the MACA-specific toolkit setup to use a build-variant-pinned version of flash_linear_attention.

Changes

Portable MXFP dequant target detection

Layer / File(s)	Summary
Target resolution and portable-path helpers `tilelang/quantize/mxfp.py`	Imports expanded to include `Target`, `determine_target`, and `target_is_gfx950`; `_resolve_mxfp_target()` selects the active target from explicit input, `Target.current()`, or `determine_target("auto")`; `_target_uses_portable_mxfp_dequant()` returns `True` for `maca` and `gfx950` targets.
get_mxfp_intrin_group portable routing `tilelang/quantize/mxfp.py`	Replaces the `_is_gfx950` try/except block with `_use_portable` from the new helpers; portable mode accepts only `source_bit=4` + `out_dtype=bfloat16`, routing to HIP twiddling or simple FP4→BF16 sources; other combinations raise `AssertionError`.

Expanded regression testing scope

Layer / File(s)	Summary
Examples root directory expansion `maint/scripts/regression_all.py`	`_examples_root()` now points to the repository-wide examples directory instead of examples/maca, changing which benchmark driver files are discovered and executed by `regression_all()`.

CI MACA toolkit dependency update

Layer / File(s)	Summary
MACA toolkit dependency setup `.github/workflows/ci.yml`	The CI workflow's MACA-specific install path fully resets any existing `flash_linear_attention` artifacts, then installs `flash_linear_attention` pinned to `0.4.0+metax3.7.2.0torch2.8` using the same MetaX index and trusted-host settings.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

tile-ai/tilelang-metax#110: Updates MACA toolkit and flash_linear_attention dependencies in the same CI workflow file.

Poem

🐰 A rabbit once hopped past a PTX gate,
And found that some targets just couldn't relate.
So helpers were dug — two burrows so neat —
To sniff out the targets and choose the right street.
Now maca and gfx950 skip the asm maze,
While broader examples light regression days! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: enabling portable C++ MXFP4 dequantization on Maca instead of PTX inline assembly, which directly aligns with the core purpose of the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

ventijing · 2026-06-22T06:09:13Z

@regression-perf

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tilelang/quantize/mxfp.py`:
- Around line 174-182: The function _target_uses_portable_mxfp_dequant currently
calls target_is_gfx950() unconditionally on all non-maca targets at the final
return statement. Since target_is_gfx950() is a ROCm-specific predicate and may
fail or reject non-ROCm targets (cuda, llvm, metal, etc.), add a guard check
before the target_is_gfx950() call to verify the target kind is ROCm-related.
Only invoke target_is_gfx950(target) if the target is actually a ROCm target;
otherwise return False to maintain safe failure behavior and prevent errors when
non-ROCm targets are passed from the call site.
- Around line 165-171: In the `_resolve_mxfp_target` function, modify the first
condition that currently checks `if target is not None:` to also exclude the
string "auto" by changing it to `if target is not None and target != "auto":`.
This ensures that when "auto" is passed as the target argument, it is not
returned directly but instead falls through to the `determine_target("auto",
return_object=True)` call on the last line, which properly resolves it to a
Target object rather than passing the unresolved "auto" string downstream.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 29bad3d8-7a97-4419-99cd-3ebbfae39c18

📥 Commits

Reviewing files that changed from the base of the PR and between f5af19d and 1d335ad.

📒 Files selected for processing (1)

tilelang/quantize/mxfp.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@maint/scripts/regression_all.py`:
- Line 60: The merge logic using `if k not in merged` silently skips duplicate
benchmark names, which can hide regressions when the expanded examples scope
encounters multiple benchmarks with the same name across different drivers.
Modify the merge operation in the regression_all() function to detect when a key
already exists in the merged dictionary and raise an exception or error with
details about the collision, rather than silently ignoring the duplicate entry.
Alternatively, namespace the keys by including the source file path along with
the benchmark name to ensure uniqueness while preserving all results.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a41f2eaf-ff55-4680-b140-e7218c64cc67

📥 Commits

Reviewing files that changed from the base of the PR and between 1d335ad and 1a6a8cb.

📒 Files selected for processing (1)

maint/scripts/regression_all.py

coderabbitai · 2026-06-22T07:17:22Z


 def _examples_root() -> Path:
-    return Path(__file__).resolve().parents[2] / "examples" / "maca"
+    return Path(__file__).resolve().parents[2] / "examples"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Expanded examples scope can silently drop benchmark results due to name collisions.

By switching discovery to the repo-wide examples tree (Line 60), regression_all() now has a much higher chance of encountering duplicate benchmark names across different drivers. The current merge logic (if k not in merged) silently ignores later entries, which can hide regressions and undercount totals.

A safer approach is to either fail on duplicate names or namespace keys by file path when merging.

Suggested fix (fail fast on duplicate benchmark names)

@@ - for k, v in parsed.items(): - if k not in merged: - merged[k] = v - _RESULTS.append(PerfResult(name=k, latency=v)) + for k, v in parsed.items(): + if k in merged: + failures.append( + f"{rel_path}\nDuplicate benchmark name detected: {k!r}. " + "Benchmark names must be globally unique across discovered drivers." + ) + continue + merged[k] = v + _RESULTS.append(PerfResult(name=k, latency=v))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@maint/scripts/regression_all.py` at line 60, The merge logic using `if k not in merged` silently skips duplicate benchmark names, which can hide regressions when the expanded examples scope encounters multiple benchmarks with the same name across different drivers. Modify the merge operation in the regression_all() function to detect when a key already exists in the merged dictionary and raise an exception or error with details about the collision, rather than silently ignoring the duplicate entry. Alternatively, namespace the keys by including the source file path along with the benchmark name to ensure uniqueness while preserving all results.

Five-HZ · 2026-06-24T06:07:13Z

-                f"AMD gfx950 MXFP dequant only supports source_bit=4 and out_dtype=bfloat16, "
+                f"Portable MXFP dequant only supports source_bit=4 and out_dtype=bfloat16, "
                f"got source_bit={source_bit}, out_dtype={out_dtype}"
            )


add if branch in here like gfx950 to resolve same issue with maca target.

[MetaxGPU][quantize] Use portable C++ MXFP4 dequant on Maca

1d335ad

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread tilelang/quantize/mxfp.py

Comment thread tilelang/quantize/mxfp.py Outdated

jcao added 2 commits June 22, 2026 06:58

[MetaxGPU][regression] Default regression_all scan root to examples

b3ac85a

[MetaxGPU][regression] Default regression_all scan root to examples

1a6a8cb

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

jcao added 10 commits June 22, 2026 08:55

[CI] Update flash_linear_attention version to 0.4.0+metax3.7.2.0torch2.8

3393e0b

[CI] Update flash_linear_attention version

a09c835

[CI] Update flash_linear_attention version

3ce4d43

[CI] Update flash_linear_attention version

3eca72d

[MetaxGPU][quantize] Use portable C++ MXFP4 dequant on Maca

a66f8b6

[CI] Update flash_linear_attention version

6959071

[MetaxGPU][quantize] Use portable C++ MXFP4 dequant on Maca

eb23897

[CI] Update flash_linear_attention version

e78ebdd

[CI] Update flash_linear_attention version

3404026

[CI] Update flash_linear_attention version

6ed645f

Five-HZ reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MetaxGPU][quantize] Use portable C++ MXFP4 dequant on Maca#114

[MetaxGPU][quantize] Use portable C++ MXFP4 dequant on Maca#114
ventijing wants to merge 13 commits into
tile-ai:devfrom
ventijing:dev_dequantize

ventijing commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

ventijing commented Jun 22, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Uh oh!

Five-HZ Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ventijing commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

ventijing commented Jun 22, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Five-HZ Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ventijing commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading